grpo-rl-training
π―Skillfrom orchestra-research/ai-research-skills
Trains reinforcement learning models for robotic manipulation using advanced policy optimization techniques in a modular, reproducible research environment.
Part of
orchestra-research/ai-research-skills(84 items)
Installation
npx @orchestra-research/ai-research-skillsnpx @orchestra-research/ai-research-skills list # View installed skillsnpx @orchestra-research/ai-research-skills update # Update installed skills/plugin marketplace add orchestra-research/AI-research-SKILLs/plugin install fine-tuning@ai-research-skills # Axolotl, LLaMA-Factory, PEFT, Unsloth+ 4 more commands
More from this repository10
Streamlines AI research workflows by providing curated Claude skills for data analysis, literature review, experiment design, and research paper generation.
Assists AI researchers in drafting, structuring, and generating machine learning research papers with academic writing best practices and technical precision.
Streamlines distributed machine learning training using Ray, optimizing hyperparameter tuning and parallel model execution across compute clusters.
Streamlines distributed data processing and machine learning workflows using Ray's scalable data loading and transformation capabilities.
Enables remote neural network interpretation and analysis through advanced visualization, layer probing, and activation tracking techniques.
Automates complex AI prompt engineering and optimization using DSPy's programmatic framework for building reliable language model pipelines.
Streamlines parameter-efficient fine-tuning of large language models using Transformers Reinforcement Learning (TRL) techniques and best practices.
Streamlines machine learning experiment tracking, visualization, and hyperparameter optimization using Weights & Biases platform integration
Efficiently fine-tune large language models using Parameter-Efficient Fine-Tuning (PEFT) techniques with minimal computational resources and memory overhead.
Enables distributed training of large AI models using PyTorch's Fully Sharded Data Parallel (FSDP) with advanced memory optimization and scaling techniques