safety-alignment
🔌PluginOrchestra-Research/AI-research-SKILLs
A collection of 4 AI safety and alignment research skills covering Constitutional AI, LlamaGuard safety classifier, NeMo programmable guardrails with Colang, and Meta's Prompt Guard injection detector — part of Orchestra Research's AI research engineering skills.
Overview
A collection of 4 AI safety and alignment research skills from Orchestra Research, providing engineering ability for coding agents to conduct AI safety experiments. Part of a larger collection of 83 AI research engineering skills covering model architecture, fine-tuning, tokenization, evaluation, agents, RAG, multimodal, and more.
Key Features
- Constitutional AI — AI-driven self-improvement via principles (282 lines of structured instructions)
- LlamaGuard — Safety classifier for LLM inputs/outputs (329 lines)
- NeMo Guardrails — Programmable guardrails with Colang for controlling LLM behavior (289 lines)
- Prompt Guard — Meta's 86M parameter prompt injection and jailbreak detector with 99%+ TPR and under 2ms GPU latency (313 lines)
- Research-oriented — Each skill includes reference materials and structured instructions for hands-on experimentation
- Part of 83 skills — Spans 15 categories from model architecture to MLOps to mechanistic interpretability
Who is this for?
AI safety researchers and ML engineers who want their coding agent to help with safety alignment experiments — from training guardrails to evaluating prompt injection defenses. Ideal for teams working on responsible AI deployment who need structured, reproducible safety engineering workflows.
Part of
orchestra-research-ai-research-skills
Installation
/plugin marketplace add orchestra-research/AI-research-SKILLs/plugin install safety-alignment@ai-research-skillsMore from this repository10
Streamlines AI research workflows by providing curated Claude skills for data analysis, literature review, experiment design, and research paper generation.
Prompt-engineering category of the AI Research Engineering Skills library — 4 skills covering DSPy (declarative prompt programming), Instructor (Pydantic-validated structured outputs), Guidance (regex/grammar-constrained generation), and Outlines (FSM-based structured text).
Emerging-techniques category of the AI Research Engineering Skills library — 6 skills covering Mixture-of-Experts training, Model Merging (TIES/DARE/SLERP via mergekit), Long Context (RoPE/YaRN/ALiBi), Speculative Decoding, Knowledge Distillation, and Model Pruning.
AI research skill for writing publication-ready ML papers for top conferences (NeurIPS, ICML, ICLR, ACL, AAAI, COLM) with LaTeX templates and citation verification.
A collection of 7 multimodal AI research skills covering CLIP, Whisper, LLaVA, BLIP-2, SAM, Stable Diffusion, and AudioCraft — part of Orchestra Research's 83 AI research engineering skills for coding agents.
A tokenization skill from the AI Research Engineering Skills Library, which offers 83 skills across 20 categories covering model architecture, fine-tuning, inference, and other AI research areas.
Assists AI researchers in drafting, structuring, and generating machine learning research papers with academic writing best practices and technical precision.
MLflow experiment tracking and model management skill from Orchestra Research, part of the most comprehensive open-source AI research engineering skills library with 83 skills.
Structured ideation frameworks for discovering high-impact research directions with 10 complementary lenses (384 lines). Part of orchestra-research/ai-research-skills.
FAISS vector search skill from Orchestra Research for efficient similarity search and dense vector clustering in AI research workflows.