🎯

rag-skills

🎯Skill

from llama-farm/llamafarm

VibeIndex|
What it does

Implements robust RAG document processing and retrieval using LlamaIndex, ChromaDB, and Celery for efficient, scalable AI document workflows.

rag-skills

Installation

Install skill:
npx skills add https://github.com/llama-farm/llamafarm --skill rag-skills
801
Last UpdatedJan 26, 2026

Skill Details

SKILL.md

RAG-specific best practices for LlamaIndex, ChromaDB, and Celery workers. Covers ingestion, retrieval, embeddings, and performance.

Overview

# RAG Skills for LlamaFarm

Framework-specific patterns and code review checklists for the RAG component.

Extends: [python-skills](../python-skills/SKILL.md) - All Python best practices apply here.

Component Overview

| Aspect | Technology | Version |

|--------|------------|---------|

| Python | Python | 3.11+ |

| Document Processing | LlamaIndex | 0.13+ |

| Vector Storage | ChromaDB | 1.0+ |

| Task Queue | Celery | 5.5+ |

| Embeddings | Universal/Ollama/OpenAI | Multiple |

Directory Structure

```

rag/

β”œβ”€β”€ api.py # Search and database APIs

β”œβ”€β”€ celery_app.py # Celery configuration

β”œβ”€β”€ main.py # Entry point

β”œβ”€β”€ core/

β”‚ β”œβ”€β”€ base.py # Document, Component, Pipeline ABCs

β”‚ β”œβ”€β”€ factories.py # Component factories

β”‚ β”œβ”€β”€ ingest_handler.py # File ingestion with safety checks

β”‚ β”œβ”€β”€ blob_processor.py # Binary file processing

β”‚ β”œβ”€β”€ settings.py # Pydantic settings

β”‚ └── logging.py # RAGStructLogger

β”œβ”€β”€ components/

β”‚ β”œβ”€β”€ embedders/ # Embedding providers

β”‚ β”œβ”€β”€ extractors/ # Metadata extractors

β”‚ β”œβ”€β”€ parsers/ # Document parsers (LlamaIndex)

β”‚ β”œβ”€β”€ retrievers/ # Retrieval strategies

β”‚ └── stores/ # Vector stores (ChromaDB, FAISS)

β”œβ”€β”€ tasks/ # Celery tasks

β”‚ β”œβ”€β”€ ingest_tasks.py # File ingestion

β”‚ β”œβ”€β”€ search_tasks.py # Database search

β”‚ β”œβ”€β”€ query_tasks.py # Complex queries

β”‚ β”œβ”€β”€ health_tasks.py # Health checks

β”‚ └── stats_tasks.py # Statistics

└── utils/

└── embedding_safety.py # Circuit breaker, validation

```

Quick Reference

| Topic | File | Key Points |

|-------|------|------------|

| LlamaIndex | [llamaindex.md](llamaindex.md) | Document parsing, chunking, node conversion |

| ChromaDB | [chromadb.md](chromadb.md) | Collections, embeddings, distance metrics |

| Celery | [celery.md](celery.md) | Task routing, error handling, worker config |

| Performance | [performance.md](performance.md) | Batching, caching, deduplication |

Core Patterns

Document Dataclass

```python

from dataclasses import dataclass, field

from typing import Any

@dataclass

class Document:

content: str

metadata: dict[str, Any] = field(default_factory=dict)

id: str = field(default_factory=lambda: str(uuid.uuid4()))

source: str | None = None

embeddings: list[float] | None = None

```

Component Abstract Base Class

```python

from abc import ABC, abstractmethod

class Component(ABC):

def __init__(

self,

name: str | None = None,

config: dict[str, Any] | None = None,

project_dir: Path | None = None,

):

self.name = name or self.__class__.__name__

self.config = config or {}

self.logger = RAGStructLogger(__name__).bind(name=self.name)

self.project_dir = project_dir

@abstractmethod

def process(self, documents: list[Document]) -> ProcessingResult:

pass

```

Retrieval Strategy Pattern

```python

class RetrievalStrategy(Component, ABC):

@abstractmethod

def retrieve(

self,

query_embedding: list[float],

vector_store,

top_k: int = 5,

**kwargs

) -> RetrievalResult:

pass

@abstractmethod

def supports_vector_store(self, vector_store_type: str) -> bool:

pass

```

Embedder with Circuit Breaker

```python

class Embedder(Component):

DEFAULT_FAILURE_THRESHOLD = 5

DEFAULT_RESET_TIMEOUT = 60.0

def __init__(self, ...):

super().__init__(...)

self._circuit_breaker = CircuitBreaker(

failure_threshold=config.get("failure_threshold", 5),

reset_timeout=config.get("reset_timeout", 60.0),

)

self._fail_fast = config.get("fail_fast", True)

def embed_text(self, text: str) -> list[float]:

self.check_circuit_breaker()

try:

embedding = self._call_embedding_api(text)

self.record_success()

return embedding

except Exception as e:

self.record_failure(e)

if self._fail_fast:

raise EmbedderUnavailableError(str(e)) from e

return [0.0] * self.get_embedding_dimension()

```

Review Checklist Summary

When reviewing RAG code:

  1. LlamaIndex (Medium priority)

- Proper chunking configuration

- Metadata preservation during parsing

- Error handling for unsupported formats

  1. ChromaDB (High priority)

- Thread-safe client access

- Proper distance metric selection

- Metadata type compatibility

  1. Celery (High priority)

- Task routing to correct queue

- Error logging with context

- Proper serialization

  1. Performance (Medium priority)

- Batch processing for embeddings

- Deduplication enabled

- Appropriate caching

See individual topic files for detailed checklists with grep patterns.

More from this repository10

🎯
generate-subsystem-skills🎯Skill

Generates specialized Claude Code skills for each subsystem, creating shared language and subsystem-specific checklists to optimize AI code generation across the monorepo.

🎯
common-skills🎯Skill

Manages shared Python utilities for LlamaFarm, focusing on HuggingFace model handling, GGUF file management, and cross-service consistency.

🎯
electron-skills🎯Skill

Configures secure Electron desktop application architecture with isolated processes, type-safe IPC, and cross-platform packaging for LlamaFarm.

🎯
go-skills🎯Skill

Enforces Go best practices and idiomatic patterns for secure, maintainable LlamaFarm CLI development.

🎯
typescript-skills🎯Skill

Enforces strict TypeScript best practices for React and Electron frontend applications, ensuring type safety, immutability, and clean code patterns.

🎯
cli-skills🎯Skill

Provides comprehensive Go CLI development guidelines using Cobra, Bubbletea, and Lipgloss for creating robust, interactive command-line interfaces in LlamaFarm projects.

🎯
commit-push-pr🎯Skill

Automates git workflow by committing changes, pushing to GitHub, and opening a PR with intelligent checks and handling of edge cases.

🎯
python-skills🎯Skill

Provides comprehensive Python best practices and code review guidelines for ensuring high-quality, secure, and maintainable code across LlamaFarm's Python components.

🎯
temp-files🎯Skill

Generates temporary files in a structured system directory, ensuring clean organization and easy tracking of generated reports and logs.

🎯
server-skills🎯Skill

Provides server-side best practices and code review guidelines for FastAPI, Celery, and Pydantic frameworks in Python.