🎯

evaluating-code-models

🎯Skill

from orchestra-research/ai-research-skills

What it does

Systematically assess machine learning code generation models by benchmarking performance, identifying strengths/weaknesses, and generating comparative metrics.

📦

Same repository

orchestra-research/ai-research-skills(121 items)

evaluating-code-models

Installation

Vibe Index InstallInstalls to .claude/skills/

npx vibeindex add orchestra-research/ai-research-skills --skill evaluating-code-models

skills.sh Install⚠ Installs to .agents/skills/

npx skills add orchestra-research/ai-research-skills --skill evaluating-code-models

Manual InstallCopy SKILL.md content and save to the path below

~/.claude/skills/evaluating-code-models/SKILL.md

SKILL.md

313Installs

AddedFeb 7, 2026

View on GitHub Back to Skills

More from this repository10

🏪

orchestra-research-ai-research-skills🏪Marketplace

Comprehensive open-source library of AI research and engineering skills for any AI model. Package the skills and your claude code/codex/gemini agent will be an AI research agent with full horsepower. Maintained by Orchestra Research.

🔌

prompt-engineering🔌Plugin

Prompt-engineering category of the AI Research Engineering Skills library — 4 skills covering DSPy (declarative prompt programming), Instructor (Pydantic-validated structured outputs), Guidance (regex/grammar-constrained generation), and Outlines (FSM-based structured text).

🔌

emerging-techniques🔌Plugin

Emerging-techniques category of the AI Research Engineering Skills library — 6 skills covering Mixture-of-Experts training, Model Merging (TIES/DARE/SLERP via mergekit), Long Context (RoPE/YaRN/ALiBi), Speculative Decoding, Knowledge Distillation, and Model Pruning.

🔌

ml-paper-writing🔌Plugin

AI research skill for writing publication-ready ML papers for top conferences (NeurIPS, ICML, ICLR, ACL, AAAI, COLM) with LaTeX templates and citation verification.

🔌

multimodal🔌Plugin

A collection of 7 multimodal AI research skills covering CLIP, Whisper, LLaVA, BLIP-2, SAM, Stable Diffusion, and AudioCraft — part of Orchestra Research's 83 AI research engineering skills for coding agents.

🔌

safety-alignment🔌Plugin

A collection of 4 AI safety and alignment research skills covering Constitutional AI, LlamaGuard safety classifier, NeMo programmable guardrails with Colang, and Meta's Prompt Guard injection detector — part of Orchestra Research's AI research engineering skills.

🔌

tokenization🔌Plugin

A tokenization skill from the AI Research Engineering Skills Library, which offers 83 skills across 20 categories covering model architecture, fine-tuning, inference, and other AI research areas.

🎯

ml-paper-writing🎯Skill

Assists AI researchers in drafting, structuring, and generating machine learning research papers with academic writing best practices and technical precision.

🎯

brainstorming-research-ideas🎯Skill

Structured ideation frameworks for discovering high-impact research directions with 10 complementary lenses (384 lines). Part of orchestra-research/ai-research-skills.

🎯

serving-llms-vllm🎯Skill

vLLM serving skill from Orchestra Research for deploying and serving large language models with high throughput and low latency.