2 results for tag "arize-evaluator"
LLM-as-judge evaluator workflow on Arize β define versioned evaluators (template + classification choices + judge model + invocation params + span/trace/session granularity), create tasks that run them on real data via column mapping, and enable continuous monitoring via `ax tasks trigger-run`. Use for hallucination/faithfulness/correctness/relevance scoring of spans or experiments.
Arize skill for creating **LLM-as-judge evaluators**, running evaluation tasks, and setting up continuous monitoring β part of the Arize platform's skills that guide AI coding agents to add observability, run experiments, and optimize prompts for LLM applications using the `ax` CLI.