advanced-evaluation
๐ฏSkillfrom flora131/atomic
An Atomic SDK skill for advanced evaluation techniques including pairwise comparison, position-bias mitigation, and building evaluation pipelines.
Same repository
flora131/atomic(28 items)
Installation
npx vibeindex add flora131/atomic --skill advanced-evaluationnpx skills add flora131/atomic --skill advanced-evaluation~/.claude/skills/advanced-evaluation/SKILL.mdSKILL.md
More from this repository10
Opinionated workflows, Ralph Loops, and memory for AI coding agents.
Part of the Atomic agent framework, this skill explains code functionality in detail using DeepWiki to provide comprehensive code understanding and documentation.
Part of the Atomic agent framework, this skill enables deep codebase research by dispatching specialized sub-agents โ a codebase-locator finds relevant files, a codebase-analyzer reads implementations, and an online-researcher queries external docs.
Part of the Atomic agent framework, this skill helps create, improve, and optimize prompts using best practices, within Atomic's multi-agent architecture that dispatches specialized sub-agents for focused task execution.
An Atomic SDK skill for optimizing LLM context usage through KV-cache optimization, observation masking, and context budgeting techniques.
A built-in Atomic skill that discovers and installs agent skills from the community. Part of the Atomic multi-agent harness that orchestrates Claude Code, OpenCode, and GitHub Copilot CLI.
A meta skill from the Atomic multi-agent harness that enables creating, modifying, evaluating, and benchmarking custom agent skills. Auto-invoked when building or iterating on SKILL.md files.
An Atomic SDK skill covering how LLM context windows work, including attention mechanics and progressive disclosure techniques for effective context management.
An Atomic SDK design skill that pushes UI designs to their creative limits, maximizing visual impact and boldness beyond conventional boundaries.
An Atomic SDK skill for offloading context to the filesystem and enabling file-based agent coordination to work within LLM context limits.