🔌

multimodal

🔌Plugin

Orchestra-Research/AI-research-SKILLs

|

What it does

|

A collection of 7 multimodal AI research skills covering CLIP, Whisper, LLaVA, BLIP-2, SAM, Stable Diffusion, and AudioCraft — part of Orchestra Research's 83 AI research engineering skills for coding agents.

Overview

A collection of 7 multimodal AI research skills from Orchestra Research, providing structured engineering instructions for coding agents to work with vision, speech, and generative models. Part of 83 AI research engineering skills designed to enable coding agents to conduct AI research experiments.

Key Features

CLIP — OpenAI's vision-language model for zero-shot classification (320 lines, 25k+ GitHub stars)
Whisper — Robust speech recognition supporting 99 languages (395 lines, 73k+ stars)
LLaVA — Vision-language assistant for image chat at GPT-4V level (360 lines)
BLIP-2 — Vision-language pre-training with frozen image encoders and LLMs
SAM — Segment Anything Model for universal image segmentation
Stable Diffusion — Text-to-image generation via HuggingFace Diffusers with SDXL and ControlNet (380 lines)
AudioCraft — Meta's audio generation framework for music, sound effects, and audio

Who is this for?

AI researchers and ML engineers working on multimodal applications who want their coding agent to help with vision, speech, and generative model experiments. Ideal for teams building products that combine text, image, audio, and video understanding.

Part of

orchestra-research-ai-research-skills

Installation

Add marketplace in Claude Code:

/plugin marketplace add orchestra-research/AI-research-SKILLs

Step 2. Install plugin:

/plugin install multimodal@ai-research-skills

4,849

-

Last UpdatedMar 5, 2026

View on GitHub Back to Plugins

More from this repository10

orchestra-research-ai-research-skills🏪Marketplace

Streamlines AI research workflows by providing curated Claude skills for data analysis, literature review, experiment design, and research paper generation.

prompt-engineering🔌Plugin

Prompt-engineering category of the AI Research Engineering Skills library — 4 skills covering DSPy (declarative prompt programming), Instructor (Pydantic-validated structured outputs), Guidance (regex/grammar-constrained generation), and Outlines (FSM-based structured text).

emerging-techniques🔌Plugin

Emerging-techniques category of the AI Research Engineering Skills library — 6 skills covering Mixture-of-Experts training, Model Merging (TIES/DARE/SLERP via mergekit), Long Context (RoPE/YaRN/ALiBi), Speculative Decoding, Knowledge Distillation, and Model Pruning.

ml-paper-writing🔌Plugin

AI research skill for writing publication-ready ML papers for top conferences (NeurIPS, ICML, ICLR, ACL, AAAI, COLM) with LaTeX templates and citation verification.

safety-alignment🔌Plugin

A collection of 4 AI safety and alignment research skills covering Constitutional AI, LlamaGuard safety classifier, NeMo programmable guardrails with Colang, and Meta's Prompt Guard injection detector — part of Orchestra Research's AI research engineering skills.

tokenization🔌Plugin

A tokenization skill from the AI Research Engineering Skills Library, which offers 83 skills across 20 categories covering model architecture, fine-tuning, inference, and other AI research areas.

ml-paper-writing🎯Skill

Assists AI researchers in drafting, structuring, and generating machine learning research papers with academic writing best practices and technical precision.

mlflow🎯Skill

MLflow experiment tracking and model management skill from Orchestra Research, part of the most comprehensive open-source AI research engineering skills library with 83 skills.

brainstorming-research-ideas🎯Skill

Structured ideation frameworks for discovering high-impact research directions with 10 complementary lenses (384 lines). Part of orchestra-research/ai-research-skills.

FAISS vector search skill from Orchestra Research for efficient similarity search and dense vector clustering in AI research workflows.