llava
π―Skillfrom ovachiever/droid-tings
Enables conversational image understanding by combining vision and language models for multi-turn visual interactions and question answering.
Part of
ovachiever/droid-tings(370 items)
Installation
git clone https://github.com/haotian-liu/LLaVApip install -e .python -m llava.serve.cli \python -m llava.serve.gradio_web_server \Skill Details
Large Language and Vision Assistant. Enables visual instruction tuning and image-based conversations. Combines CLIP vision encoder with Vicuna/LLaMA language models. Supports multi-turn image chat, visual question answering, and instruction following. Use for vision-language chatbots or image understanding tasks. Best for conversational image analysis.
More from this repository10
nextjs-shadcn-builder skill from ovachiever/droid-tings
security-auditor skill from ovachiever/droid-tings
threejs-graphics-optimizer skill from ovachiever/droid-tings
api-documenter skill from ovachiever/droid-tings
secret-scanner skill from ovachiever/droid-tings
readme-updater skill from ovachiever/droid-tings
applying-brand-guidelines skill from ovachiever/droid-tings
Configures Tailwind v4 with shadcn/ui, automating CSS variable setup, dark mode, and preventing common initialization errors.
deep-reading-analyst skill from ovachiever/droid-tings
dependency-auditor skill from ovachiever/droid-tings