🔌

safety-alignment

🔌Plugin

Orchestra-Research/AI-research-SKILLs

What it does

A collection of 4 AI safety and alignment research skills covering Constitutional AI, LlamaGuard safety classifier, NeMo programmable guardrails with Colang, and Meta's Prompt Guard injection detector — part of Orchestra Research's AI research engineering skills.

Overview

A collection of 4 AI safety and alignment research skills from Orchestra Research, providing engineering ability for coding agents to conduct AI safety experiments. Part of a larger collection of 83 AI research engineering skills covering model architecture, fine-tuning, tokenization, evaluation, agents, RAG, multimodal, and more.

Key Features

Constitutional AI — AI-driven self-improvement via principles (282 lines of structured instructions)
LlamaGuard — Safety classifier for LLM inputs/outputs (329 lines)
NeMo Guardrails — Programmable guardrails with Colang for controlling LLM behavior (289 lines)
Prompt Guard — Meta's 86M parameter prompt injection and jailbreak detector with 99%+ TPR and under 2ms GPU latency (313 lines)
Research-oriented — Each skill includes reference materials and structured instructions for hands-on experimentation
Part of 83 skills — Spans 15 categories from model architecture to MLOps to mechanistic interpretability

Who is this for?

AI safety researchers and ML engineers who want their coding agent to help with safety alignment experiments — from training guardrails to evaluating prompt injection defenses. Ideal for teams working on responsible AI deployment who need structured, reproducible safety engineering workflows.

🏪

Part of

orchestra-research-ai-research-skills

Installation

Add marketplace in Claude Code:

/plugin marketplace add orchestra-research/AI-research-SKILLs

Step 2. Install plugin:

/plugin install safety-alignment@ai-research-skills

10,436

Last UpdatedJun 16, 2026

View on GitHub Back to Plugins

More from this repository10

🔌

distributed-training🔌Plugin

Plugin

Plugin

A tokenization skill from the AI Research Engineering Skills Library, which offers 83 skills across 20 categories covering model architecture, fine-tuning, inference, and other AI research areas.

Plugin

Plugin

inference-serving🔌Plugin

Plugin

Plugin

Plugin

data-processing🔌Plugin

Plugin

🔌

ml-paper-writing🔌Plugin

AI research skill for writing publication-ready ML papers for top conferences (NeurIPS, ICML, ICLR, ACL, AAAI, COLM) with LaTeX templates and citation verification.