🎯

quantizing-models-bitsandbytes

🎯Skill

from orchestra-research/ai-research-skills

What it does

Quantize large language models to reduce memory footprint and accelerate inference using efficient 8-bit and 4-bit compression techniques with bitsandbytes.

📦

Part of

orchestra-research/ai-research-skills(84 items)

quantizing-models-bitsandbytes

Installation

npxRun with npx

npx @orchestra-research/ai-research-skills

npxRun with npx

npx @orchestra-research/ai-research-skills list # View installed skills

npxRun with npx

npx @orchestra-research/ai-research-skills update # Update installed skills

Add MarketplaceAdd marketplace to Claude Code

/plugin marketplace add orchestra-research/AI-research-SKILLs

Install PluginInstall plugin from marketplace

/plugin install fine-tuning@ai-research-skills # Axolotl, LLaMA-Factory, PEFT, Unsloth

+ 4 more commands

📖 Extracted from docs: orchestra-research/ai-research-skills

Need more details? View full documentation on GitHub →

1Installs

AddedFeb 7, 2026

View on GitHub Back to Skills

More from this repository10

🏪

orchestra-research-ai-research-skills🏪Marketplace

Streamlines AI research workflows by providing curated Claude skills for data analysis, literature review, experiment design, and research paper generation.

🎯

ml-paper-writing🎯Skill

Assists AI researchers in drafting, structuring, and generating machine learning research papers with academic writing best practices and technical precision.

🎯

ray-data🎯Skill

Streamlines distributed data processing and machine learning workflows using Ray's scalable data loading and transformation capabilities.

🎯

ray-train🎯Skill

Streamlines distributed machine learning training using Ray, optimizing hyperparameter tuning and parallel model execution across compute clusters.

🎯

distributed-llm-pretraining-torchtitan🎯Skill

Streamlines large-scale distributed machine learning training for transformer models using PyTorch Titan, optimizing GPU utilization and model performance

🎯

awq-quantization🎯Skill

Quantizes large language models using Activation-aware Weight Quantization (AWQ) to reduce model size and improve inference efficiency.

🎯

gptq🎯Skill

Quantize and compress large language models using GPTQ for efficient inference and reduced memory footprint on various hardware.

🎯

mlflow🎯Skill

Streamline machine learning experiment tracking, model versioning, and deployment management with comprehensive MLflow integration and best practices.

🎯

hqq-quantization🎯Skill

Performs hardware-aware quantization of neural networks using HQQ (Highly Quantized Quantization) to reduce model size and improve inference efficiency.

🎯

llama-factory🎯Skill

Streamlines fine-tuning and deployment of Llama language models with automated configuration, dataset processing, and model optimization workflows.