6 results for tag "eval"
`/hub:eval` β rank all AgentHub agent results for a session using metric mode (run an eval command in each agent's worktree with `scripts/result_ranker.py --session ... --eval-cmd ... --metric ... --direction ...`), LLM judge mode (compare `git diff {base_branch}...{agent_branch}` plus each agent's `.agenthub/board/results/agent-{i}-result.md` on correctness / simplicity / quality), or hybrid (metric first, LLM tie-break within 10%). Updates session state via `session_manager.py --update ... --state evaluating` and points to `/hub:merge` for the winner.
Bundled plugins for Claude Code including IDA Domain scription and IDA Plugin development