30 results for tag "evaluation"
A professional Claude Code skills marketplace featuring 37 production-ready skills for enhanced development workflows.
AI-native, LLM-driven threat-modeling skill (fr33d3m0n/threat-modeling v3.1.0) for automated software risk analysis, security audit, and penetration testing β covers the full OWASP MCP Top 10 (2025), the SAO (Subject-Action-Object) agent threat model, 13 pre-built MITRE ATT&CK agent attack chains, and the `SKILL.MD = UNTRUSTED` trust-inversion paradigm for agent security.
Systematically evaluates agent system performance through multi-dimensional rubrics, tracking improvements, and validating context engineering choices.
Production-ready skills for SAP development with Claude Code CLI
Code-First Deep Threat Modeling - LLM-native security analysis framework with automated 8-phase workflow, dual-track knowledge architecture (Security Controls + Threat Patterns), and comprehensive verification capabilities. Transform any codebase into structured threat models without design documents.
Comprehensive full-stack development skills for AI-assisted development covering UI/UX, backend, DevOps, infrastructure, security, and AI/ML.
Plugin and Skills for Claude Code, Gemini CLI and Codex
A Claude Code plugin that optimizes documentation for AI coding assistants like Claude, GitHub Copilot, and other LLMs. Makes your docs more effective through c7score optimization, llms.txt generation, question-driven restructuring, and automated quality scoring.
Automated quality assurance for Claude Code agents using LLM-as-judge evaluation. Built by BrandCast.
Comprehensive Claude Code plugin marketplace featuring productivity commands (/quick-test, /analyze-deps, /project-stats), specialized code analysis agents (security-auditor, performance-optimizer, architecture-reviewer), and automatic code formatting hooks. Perfect for learning plugin development or extending your Claude Code workflow with production-ready examples.
An Atomic SDK skill for conducting UX evaluation with quantitative scoring and persona testing to assess design quality.
A collection of specialized Claude Code skills covering financial operations, business finance analysis, tax planning, and professional development workflows.
Skill for evaluating AI agent performance using LLM-as-judge patterns, multi-dimensional evaluation rubrics, and quality gates for agent pipelines.