๐ŸŽฏ

llm-evaluation

๐ŸŽฏSkill

from wshobson/agents

VibeIndex|
What it does
|

A production-ready plugin system with 112 AI agents, 146 skills, 16 workflow orchestrators, and 79 development tools organized into 73 focused plugins for Claude Code.

Overview

A Claude Code skill from the wshobson/agents plugin marketplace that provides specialized knowledge for evaluating and benchmarking large language models. It is part of a comprehensive system of 112 specialized AI agents and 146 agent skills organized into 73 focused plugins, optimized for minimal token usage.

Key Features

  • AI/ML Domain Expertise - Backed by a specialized agent with deep knowledge in LLM evaluation methodologies, benchmarking, and quality assessment
  • Granular Plugin Design - Loads only LLM evaluation-related components, keeping context focused on model assessment
  • Progressive Disclosure - Evaluation knowledge activates only when needed, maintaining efficient context management
  • Composable with AI Skills - Designed to work alongside other data/AI and machine learning plugins for comprehensive ML workflows
  • Production-Ready Evaluation Patterns - Provides tested patterns for prompt evaluation, model comparison, and quality metrics

Who is this for?

This skill is designed for AI/ML engineers and researchers who need structured guidance on evaluating large language model performance and quality. It is particularly useful for teams selecting between models, building evaluation harnesses, or establishing quality benchmarks for their LLM-powered applications.

๐Ÿ“ฆ

Same repository

wshobson/agents(234 items)

llm-evaluation

Installation

Vibe Index InstallInstalls to .claude/skills/ - auto-recognized by Claude Code
npx vibeindex add wshobson/agents --skill llm-evaluation
skills.sh Installโš  Installs to .agents/skills/ - may not be auto-recognized by Claude Code
npx skills add wshobson/agents --skill llm-evaluation
Manual InstallCopy SKILL.md content and save to the path below
~/.claude/skills/llm-evaluation/SKILL.md

SKILL.md

2,799Installs
27,673
-
AddedJan 31, 2026

More from this repository10

๐Ÿ”Œ
ui-design๐Ÿ”ŒPlugin

The ui-design plugin is part of the wshobson/agents marketplace for Claude Code, providing specialized AI agents for UI/UX design assistance within development workflows.

๐Ÿ”Œ
data-validation-suite๐Ÿ”ŒPlugin

The data-validation-suite plugin is part of the wshobson/agents marketplace for Claude Code. It falls under the Data category, which includes two data-focused plugins: data engineering and data validation.

๐Ÿ”Œ
deployment-validation๐Ÿ”ŒPlugin

A Claude Code plugin from the wshobson/agents marketplace for deployment validation, providing specialized AI agents and tools to ensure reliable production deployments within a 73-plugin ecosystem.

๐Ÿ”Œ
shell-scripting๐Ÿ”ŒPlugin

Shell Scripting is a Claude Code plugin from the wshobson/agents marketplace that provides AI-powered assistance for writing and maintaining shell scripts.

๐Ÿ”Œ
machine-learning-ops๐Ÿ”ŒPlugin

An MLOps plugin from the wshobson/agents ecosystem providing Claude Code with specialized agents and skills for ML pipeline management, model deployment, experiment tracking, and production monitoring.

๐Ÿ”Œ
accessibility-compliance๐Ÿ”ŒPlugin

A Claude Code plugin with specialized AI agents for accessibility compliance auditing, WCAG standards verification, and remediation guidance in web and mobile applications.

๐Ÿ”Œ
reverse-engineering๐Ÿ”ŒPlugin

The reverse-engineering plugin is part of the wshobson/agents marketplace for Claude Code, providing specialized AI agents for code analysis, binary examination, and system reverse engineering tasks.

๐Ÿ”Œ
cicd-automation๐Ÿ”ŒPlugin

A Claude Code plugin for CI/CD automation with 4 specialized skills covering pipeline design, GitHub Actions, GitLab CI, and secrets management, part of the wshobson/agents marketplace.

๐Ÿ”Œ
functional-programming๐Ÿ”ŒPlugin

The functional-programming plugin is part of the wshobson/agents marketplace for Claude Code. It falls under the Languages category, which includes seven language-focused plugins covering Python, JavaScript/TypeScript, systems programming, JVM, sc...

๐Ÿ”Œ
comprehensive-review๐Ÿ”ŒPlugin

Comprehensive Review is a Claude Code plugin from the wshobson/agents marketplace that provides multi-perspective code analysis covering architecture, security, and best practices.