gptq
π―Skillfrom ovachiever/droid-tings
Enables 4-bit quantization of large language models, reducing memory by 4Γ and boosting inference speed on consumer GPUs with minimal accuracy loss.
Part of
ovachiever/droid-tings(370 items)
Installation
pip install auto-gptqpip install auto-gptq[triton]pip install auto-gptq --no-build-isolationpip install auto-gptq transformers accelerateSkill Details
Post-training 4-bit quantization for LLMs with minimal accuracy loss. Use for deploying large models (70B, 405B) on consumer GPUs, when you need 4Γ memory reduction with <2% perplexity degradation, or for faster inference (3-4Γ speedup) vs FP16. Integrates with transformers and PEFT for QLoRA fine-tuning.
More from this repository10
nextjs-shadcn-builder skill from ovachiever/droid-tings
security-auditor skill from ovachiever/droid-tings
threejs-graphics-optimizer skill from ovachiever/droid-tings
api-documenter skill from ovachiever/droid-tings
secret-scanner skill from ovachiever/droid-tings
readme-updater skill from ovachiever/droid-tings
applying-brand-guidelines skill from ovachiever/droid-tings
Configures Tailwind v4 with shadcn/ui, automating CSS variable setup, dark mode, and preventing common initialization errors.
deep-reading-analyst skill from ovachiever/droid-tings
dependency-auditor skill from ovachiever/droid-tings