llm-evaluation
🎯Skillfrom lifangda/claude-plugins
Systematically assess and benchmark large language models across performance metrics, accuracy, bias, and ethical considerations.
Part of
lifangda/claude-plugins(264 items)
Installation
npx skills add https://github.com/lifangda/claude-plugins --skill llm-evaluationNeed more details? View full documentation on GitHub →
More from this repository10
完整的Claude插件生态系统:504个专业代理、313个开发命令、16个工作流、39个钩子、56个MCP服务器、18个输出样式,专业化分类,即装即用。额外提供151个Agent Skills知识库(独立管理)
Searches and retrieves patent information from the USPTO database, providing detailed insights on inventions, applications, and technical documentation.
Processes and analyzes digital pathology whole slide images with advanced tissue segmentation, staining normalization, and computational pathology techniques
Guides developers through writing tests first, designing modular code, and implementing robust software with systematic TDD workflows and best practices.
Retrieves and processes demographic, economic, and geographic data from Google's Data Commons using simple API queries and transformations.
Automates web application security testing by performing rapid directory and file discovery, parameter enumeration, and vulnerability scanning using ffuf.
Provides comprehensive patterns and best practices for writing robust, maintainable JavaScript unit, integration, and end-to-end tests across frameworks.
Queries and retrieves comprehensive genetic, chemical, and clinical data on disease-gene-drug associations from the Open Targets Platform.
Systematically diagnose software and system failures by tracing error propagation, identifying root causes, and recommending targeted remediation strategies.
Generates hierarchical tree diagrams and visualizes complex nested data structures with customizable rendering and export options.