2 results for tag "agent-eval"
Lightweight CLI for running reproducible head-to-head comparisons of coding agents (Claude Code, Aider, Codex, etc.) on YAML-defined tasks, measuring pass rate, cost, time, and consistency. Use it to make data-backed agent selection decisions instead of relying on vibes.