4 results for tag "grpo-rl-training"
A large collection of Claude Code skill templates sponsored by Z.AI, providing ready-to-use development skill configurations across various domains.
A skill from the AI Research Engineering Skills library that teaches AI coding agents how to implement GRPO (Group Relative Policy Optimization) for reinforcement learning training of language models.
Guides fine-tuning language models using Group Relative Policy Optimization (GRPO) for structured reasoning and task-specific training with TRL.