agent-evaluation

23 results for tag "agent-evaluation"

🎯

Skills

agent-evaluationsupercent-io/skills-template10.1K0

agent-evaluationsickn33/antigravity-awesome-skills5560

A collection of 255+ universal agentic skills for AI coding assistants including Claude Code, Gemini CLI, Codex CLI, Antigravity IDE, GitHub Copilot, and Cursor.

agent-evaluation

agent-evaluationdavila7/claude-code-templates4510

Evaluates AI agent performance by systematically testing and scoring their capabilities across multiple predefined metrics and scenarios.

agent-evaluation

agent-evaluationmlflow/skills2220

Agent evaluation skill using MLflow for systematically evaluating and improving LLM agent output quality. Covers tool selection accuracy, answer quality, cost reduction, and end-to-end evaluation with datasets, scorers, and tracing.

agent-evaluation

agent-evaluationguia-matthieu/clawfu-skills800

An agent evaluation skill from the ClawFu collection of 175 expert marketing methodologies, providing structured frameworks for assessing AI agent quality, performance, and outputs using named expert methodologies.

agent-evaluation

agent-evaluationneolabhq/context-engineering-kit690

Evaluates AI agent performance with structured assessment frameworks, benchmarks, and improvement tracking for context engineering workflows.

agent-evaluation

agent-evaluationeyadsibai/ltk470

Agent evaluation skill from ltk, a personal development toolkit for Claude Code with 35 skills, 16 commands, 7 agents, 4 hooks, and 3 MCP servers. Provides extensible, per-project tooling with auto-loading domain knowledge.

agent-evaluation

agent-evaluationoimiragieo/agent-studio280

Evaluates AI agent performance across multiple dimensions, generating comprehensive metrics and insights for benchmarking and improvement strategies.

agent-evaluation

agent-evaluationakillness/skills-template★ 18 agent-evaluationomer-metin/skills-for-antigravity★ 15 agent-evaluationakillness/oh-my-gods★ 12 agent-evaluationb-step62/skills★ 9 agent-evaluationzpankz/mcp-skillset★ 7 agent-evaluationhainamchung/agent-assistant

🎯

Skills

agent-evaluationsupercent-io/skills-template10.1K0

agent-evaluation

agent-evaluationsickn33/antigravity-awesome-skills5560

A collection of 255+ universal agentic skills for AI coding assistants including Claude Code, Gemini CLI, Codex CLI, Antigravity IDE, GitHub Copilot, and Cursor.

agent-evaluation

agent-evaluationdavila7/claude-code-templates4510

Evaluates AI agent performance by systematically testing and scoring their capabilities across multiple predefined metrics and scenarios.

agent-evaluation

agent-evaluationmlflow/skills2220

agent-evaluation

agent-evaluationguia-matthieu/clawfu-skills800

agent-evaluation

agent-evaluationneolabhq/context-engineering-kit690

Evaluates AI agent performance with structured assessment frameworks, benchmarks, and improvement tracking for context engineering workflows.

agent-evaluation

agent-evaluationeyadsibai/ltk470

agent-evaluation

agent-evaluationoimiragieo/agent-studio280

Evaluates AI agent performance across multiple dimensions, generating comprehensive metrics and insights for benchmarking and improvement strategies.

agent-evaluation