experiment-analzyer-comparative
π―Skillfrom datadog-labs/agent-skills
Same repository
datadog-labs/agent-skills(35 items)
Installation
npx vibeindex add datadog-labs/agent-skills --skill experiment-analzyer-comparativenpx skills add datadog-labs/agent-skills --skill experiment-analzyer-comparative~/.claude/skills/experiment-analzyer-comparative/SKILL.mdSKILL.md
More from this repository10
A skill for the Datadog CLI (pup) built in Rust with OAuth2 authentication. Enables searching logs, listing monitors, querying metrics, triaging security signals, and managing incidents and downtimes.
Five essential Datadog skills for AI agents including CLI commands, monitor management, log search, APM traces, and documentation search. Compatible with Claude Code, Codex CLI, Gemini CLI, Cursor, and other agents.
A Datadog agent skill for log management including search, pipelines, archives, and cost control, using the Datadog Pup CLI tool.
A Datadog agent skill for APM (Application Performance Monitoring) including distributed tracing, service maps, and performance analysis using the Datadog Pup CLI.
A Datadog agent skill for monitor management including creating, updating, muting, and alerting best practices using the Datadog Pup CLI tool.
A Datadog agent skill for documentation lookup using docs.datadoghq.com/llms.txt and linked Markdown pages, enabling efficient access to Datadog's product documentation.
Root-causes production LLM failures by analyzing eval judge verdicts and runtime errors across Datadog LLM Observability traces. Outputs a failure taxonomy that can seed evaluator generation via the companion eval-bootstrap skill.
Analyzes and compares offline LLM experiments in Datadog LLM Observability, supporting single experiment analysis, baseline/candidate comparison, targeted questions, and optional export to Datadog notebooks.
Generates evaluator code from Datadog LLM Observability production traces, optionally seeded by root-cause analysis output. Part of the Datadog LLMO eval pipeline for diagnosing failures and building automated evaluators.
A Datadog skill that runs an end-to-end LLM Observability evaluation pipeline: classifying user sessions, diagnosing failures via root cause analysis of eval judge verdicts, and bootstrapping evaluator code to capture discovered failure patterns. Integrates with Datadog's LLM Observability and RUM data via MCP.