llama-cpp
π―Skillfrom ovachiever/droid-tings
Enables efficient LLM inference on CPUs, Apple Silicon, and non-NVIDIA GPUs using lightweight, quantized models with minimal dependencies.
Part of
ovachiever/droid-tings(370 items)
Installation
git clone https://github.com/ggerganov/llama.cpppython convert_hf_to_gguf.py models/llama-2-7b-chat/Skill Details
Runs LLM inference on CPU, Apple Silicon, and consumer GPUs without NVIDIA hardware. Use for edge deployment, M1/M2/M3 Macs, AMD/Intel GPUs, or when CUDA is unavailable. Supports GGUF quantization (1.5-8 bit) for reduced memory and 4-10Γ speedup vs PyTorch on CPU.
More from this repository10
nextjs-shadcn-builder skill from ovachiever/droid-tings
security-auditor skill from ovachiever/droid-tings
threejs-graphics-optimizer skill from ovachiever/droid-tings
api-documenter skill from ovachiever/droid-tings
secret-scanner skill from ovachiever/droid-tings
readme-updater skill from ovachiever/droid-tings
applying-brand-guidelines skill from ovachiever/droid-tings
Configures Tailwind v4 with shadcn/ui, automating CSS variable setup, dark mode, and preventing common initialization errors.
deep-reading-analyst skill from ovachiever/droid-tings
dependency-auditor skill from ovachiever/droid-tings