🎯

llm-evaluation

🎯Skill

from wshobson/agents

|

What it does

|

A production-ready plugin system with 112 AI agents, 146 skills, 16 workflow orchestrators, and 79 development tools organized into 73 focused plugins for Claude Code.

Overview

A Claude Code skill from the wshobson/agents plugin marketplace that provides specialized knowledge for evaluating and benchmarking large language models. It is part of a comprehensive system of 112 specialized AI agents and 146 agent skills organized into 73 focused plugins, optimized for minimal token usage.

Key Features

AI/ML Domain Expertise - Backed by a specialized agent with deep knowledge in LLM evaluation methodologies, benchmarking, and quality assessment
Granular Plugin Design - Loads only LLM evaluation-related components, keeping context focused on model assessment
Progressive Disclosure - Evaluation knowledge activates only when needed, maintaining efficient context management
Composable with AI Skills - Designed to work alongside other data/AI and machine learning plugins for comprehensive ML workflows
Production-Ready Evaluation Patterns - Provides tested patterns for prompt evaluation, model comparison, and quality metrics

Who is this for?

This skill is designed for AI/ML engineers and researchers who need structured guidance on evaluating large language model performance and quality. It is particularly useful for teams selecting between models, building evaluation harnesses, or establishing quality benchmarks for their LLM-powered applications.

📦

Same repository

wshobson/agents(244 items)

llm-evaluation

Installation

Vibe Index InstallInstalls to .claude/skills/

npx vibeindex add wshobson/agents --skill llm-evaluation

skills.sh Install⚠ Installs to .agents/skills/

npx skills add wshobson/agents --skill llm-evaluation

Manual InstallCopy SKILL.md content and save to the path below

~/.claude/skills/llm-evaluation/SKILL.md

SKILL.md

8,106Installs

27,673

-

AddedJan 31, 2026

llm-evaluation

Overview

Key Features

Who is this for?

Installation

SKILL.md

More from this repository10