quantizing-models-bitsandbytes
π―Skillfrom orchestra-research/ai-research-skills
Quantize large language models to reduce memory footprint and accelerate inference using efficient 8-bit and 4-bit compression techniques with bitsandbytes.
Part of
orchestra-research/ai-research-skills(84 items)
Installation
npx @orchestra-research/ai-research-skillsnpx @orchestra-research/ai-research-skills list # View installed skillsnpx @orchestra-research/ai-research-skills update # Update installed skills/plugin marketplace add orchestra-research/AI-research-SKILLs/plugin install fine-tuning@ai-research-skills # Axolotl, LLaMA-Factory, PEFT, Unsloth+ 4 more commands
More from this repository10
Streamlines AI research workflows by providing curated Claude skills for data analysis, literature review, experiment design, and research paper generation.
Assists AI researchers in drafting, structuring, and generating machine learning research papers with academic writing best practices and technical precision.
Streamlines distributed data processing and machine learning workflows using Ray's scalable data loading and transformation capabilities.
Streamlines distributed machine learning training using Ray, optimizing hyperparameter tuning and parallel model execution across compute clusters.
Streamlines large-scale distributed machine learning training for transformer models using PyTorch Titan, optimizing GPU utilization and model performance
Quantizes large language models using Activation-aware Weight Quantization (AWQ) to reduce model size and improve inference efficiency.
Quantize and compress large language models using GPTQ for efficient inference and reduced memory footprint on various hardware.
Streamline machine learning experiment tracking, model versioning, and deployment management with comprehensive MLflow integration and best practices.
Performs hardware-aware quantization of neural networks using HQQ (Highly Quantized Quantization) to reduce model size and improve inference efficiency.
Streamlines fine-tuning and deployment of Llama language models with automated configuration, dataset processing, and model optimization workflows.