advanced-evaluation
π―Skillfrom muratcankoylan/agent-skills-for-context-engineering
Develops robust LLM-as-a-judge evaluation techniques, mitigating biases and creating reliable automated quality assessment frameworks for comparing model outputs.
Part of
muratcankoylan/agent-skills-for-context-engineering(21 items)
Installation
/plugin marketplace add muratcankoylan/Agent-Skills-for-Context-Engineering/plugin install context-engineering-fundamentals@context-engineering-marketplace/plugin install agent-architecture@context-engineering-marketplace/plugin install agent-evaluation@context-engineering-marketplace/plugin install agent-development@context-engineering-marketplace+ 1 more commands
Skill Details
This skill should be used when the user asks to "implement LLM-as-judge", "compare model outputs", "create evaluation rubrics", "mitigate evaluation bias", or mentions direct scoring, pairwise comparison, position bias, evaluation pipelines, or automated quality assessment.
More from this repository10
Context Engineering skills for building production-grade AI agent systems
context-engineering-collection skill from muratcankoylan/agent-skills-for-context-engineering
hosted-agents skill from muratcankoylan/agent-skills-for-context-engineering
Compresses and optimizes conversation context by strategically summarizing and preserving critical information while minimizing token usage across long-running agent sessions.
Optimizes context windows by strategically compressing, masking, caching, and partitioning to extend effective context capacity without increasing model size.
Enables persistent knowledge storage and retrieval across agent sessions through layered memory architectures, knowledge graphs, and temporal tracking.
filesystem-context skill from muratcankoylan/agent-skills-for-context-engineering
Skill
Designs agent tools with clear contracts, unambiguous interfaces, and precise descriptions to enable effective agent-system interactions.
Guides users through designing LLM project architectures, evaluating task-model fit, and selecting optimal agent-based development strategies.