ai-engineer
π―Skillfrom erichowens/some_claude_skills
Builds production-ready LLM applications with advanced RAG, vector search, and intelligent agent architectures for enterprise AI solutions.
Installation
npx skills add https://github.com/erichowens/some_claude_skills --skill ai-engineerSkill Details
Build production-ready LLM applications, advanced RAG systems, and intelligent agents. Implements vector search, multimodal AI, agent orchestration, and enterprise AI integrations. Use PROACTIVELY for LLM features, chatbots, AI agents, or AI-powered applications.
Overview
# AI Engineer
Expert in building production-ready LLM applications, from simple chatbots to complex multi-agent systems. Specializes in RAG architectures, vector databases, prompt management, and enterprise AI deployments.
Quick Start
```
User: "Build a customer support chatbot with our product documentation"
AI Engineer:
- Design RAG architecture (chunking, embedding, retrieval)
- Set up vector database (Pinecone/Weaviate/Chroma)
- Implement retrieval pipeline with reranking
- Build conversation management with context
- Add guardrails and fallback handling
- Deploy with monitoring and observability
```
Result: Production-ready AI chatbot in days, not weeks
Core Competencies
1. RAG System Design
| Component | Implementation | Best Practices |
|-----------|---------------|----------------|
| Chunking | Semantic, token-based, hierarchical | 512-1024 tokens, overlap 10-20% |
| Embedding | OpenAI, Cohere, local models | Match model to domain |
| Vector DB | Pinecone, Weaviate, Chroma, Qdrant | Index by use case |
| Retrieval | Dense, sparse, hybrid | Start hybrid, tune |
| Reranking | Cross-encoder, Cohere Rerank | Always rerank top-k |
2. LLM Application Patterns
- Chat with memory and context management
- Agentic workflows with tool use
- Multi-model orchestration (router + specialists)
- Structured output generation (JSON, XML)
- Streaming responses with error handling
3. Production Operations
- Token usage tracking and cost optimization
- Latency monitoring and caching strategies
- A/B testing for prompt versions
- Fallback chains and graceful degradation
- Security (prompt injection, PII handling)
Architecture Patterns
Basic RAG Pipeline
```typescript
// Simple RAG implementation
async function ragQuery(query: string): Promise
// 1. Embed the query
const queryEmbedding = await embed(query);
// 2. Retrieve relevant chunks
const chunks = await vectorDb.query({
vector: queryEmbedding,
topK: 10,
includeMetadata: true
});
// 3. Rerank for relevance
const reranked = await reranker.rank(query, chunks);
const topChunks = reranked.slice(0, 5);
// 4. Generate response with context
const response = await llm.chat({
system: SYSTEM_PROMPT,
messages: [
{ role: 'user', content: buildPrompt(query, topChunks) }
]
});
return response.content;
}
```
Agent Architecture
```typescript
// Agentic loop with tool use
interface Agent {
systemPrompt: string;
tools: Tool[];
maxIterations: number;
}
async function runAgent(agent: Agent, task: string): Promise
const messages: Message[] = [];
let iterations = 0;
while (iterations < agent.maxIterations) {
const response = await llm.chat({
system: agent.systemPrompt,
messages: [...messages, { role: 'user', content: task }],
tools: agent.tools
});
if (!response.toolCalls) {
return response.content; // Final answer
}
// Execute tools and continue
const toolResults = await executeTools(response.toolCalls);
messages.push({ role: 'assistant', content: response });
messages.push({ role: 'tool', content: toolResults });
iterations++;
}
throw new Error('Max iterations exceeded');
}
```
Multi-Model Router
```typescript
// Route queries to appropriate models
const MODEL_ROUTER = {
simple: 'claude-3-haiku', // Fast, cheap
moderate: 'claude-3-sonnet', // Balanced
complex: 'claude-3-opus', // Best quality
};
function routeQuery(query: string, context: any): ModelId {
// Classify complexity
if (isSimpleQuery(query)) return MODEL_ROUTER.simple;
if (requiresReasoning(query, context)) return MODEL_ROUTER.complex;
return MODEL_ROUTER.moderate;
}
```
Implementation Checklist
RAG System
- [ ] Document ingestion pipeline
- [ ] Chunking strategy (semantic preferred)
- [ ] Embedding model selection
- [ ] Vector database setup
- [ ] Retrieval with hybrid search
- [ ] Reranking layer
- [ ] Citation/source tracking
- [ ] Evaluation metrics (relevance, faithfulness)
Production Readiness
- [ ] Error handling and retries
- [ ] Rate limiting
- [ ] Token tracking
- [ ] Cost monitoring
- [ ] Latency metrics
- [ ] Caching layer
- [ ] Fallback responses
- [ ] PII filtering
- [ ] Prompt injection guards
Observability
- [ ] Request logging
- [ ] Response quality scoring
- [ ] User feedback collection
- [ ] A/B test framework
- [ ] Drift detection
- [ ] Alert thresholds
Anti-Patterns
Anti-Pattern: RAG Everything
What it looks like: Using RAG for every query
Why wrong: Adds latency, cost, and complexity when unnecessary
Instead: Classify queries, use RAG only when context needed
Anti-Pattern: Chunking by Character
What it looks like: text.slice(0, 1000) for chunks
Why wrong: Breaks semantic meaning, poor retrieval
Instead: Semantic chunking respecting document structure
Anti-Pattern: No Reranking
What it looks like: Using raw vector similarity as final ranking
Why wrong: Embedding similarity != relevance for query
Instead: Always add cross-encoder reranking
Anti-Pattern: Unbounded Context
What it looks like: Stuffing all retrieved chunks into prompt
Why wrong: Dilutes relevance, wastes tokens, confuses model
Instead: Top 3-5 chunks after reranking, dynamic selection
Anti-Pattern: No Guardrails
What it looks like: Direct user input to LLM
Why wrong: Prompt injection, toxic outputs, off-topic responses
Instead: Input validation, output filtering, topic guardrails
Technology Stack
Vector Databases
| Database | Best For | Notes |
|----------|----------|-------|
| Pinecone | Production, scale | Managed, fast |
| Weaviate | Hybrid search | GraphQL, modules |
| Chroma | Development, local | Embedded, simple |
| Qdrant | Self-hosted, filters | Rust, performant |
| pgvector | Existing Postgres | Easy integration |
LLM Frameworks
| Framework | Best For | Notes |
|-----------|----------|-------|
| LangChain | Prototyping | Many integrations |
| LlamaIndex | RAG focus | Document handling |
| Vercel AI SDK | Streaming, React | Edge-ready |
| Anthropic SDK | Direct API | Full control |
Embedding Models
| Model | Dimensions | Notes |
|-------|------------|-------|
| text-embedding-3-large | 3072 | Best quality |
| text-embedding-3-small | 1536 | Cost-effective |
| voyage-2 | 1024 | Code, technical |
| bge-large | 1024 | Open source |
When to Use
Use for:
- Building chatbots and conversational AI
- Implementing RAG systems
- Creating AI agents with tools
- Designing multi-model architectures
- Production AI deployments
Do NOT use for:
- Prompt optimization (use prompt-engineer)
- ML model training (use ml-engineer)
- Data pipelines (use data-pipeline-engineer)
- General backend (use backend-architect)
---
Core insight: Production AI systems need more than good promptsβthey need robust retrieval, intelligent routing, comprehensive monitoring, and graceful failure handling.
Use with: prompt-engineer (optimization) | chatbot-analytics (monitoring) | backend-architect (infrastructure)
More from this repository10
Conducts comprehensive market research, competitive analysis, and evidence-based strategy recommendations across diverse landscapes and industries.
Generates harmonious color palettes using color theory principles, recommending complementary, analogous, and triadic color schemes for design projects.
Intelligently coordinates multiple specialized skills, dynamically decomposes complex tasks, synthesizes outputs, and creates new skills to fill capability gaps.
Validates and enforces output quality by checking agent responses against predefined schemas, structural requirements, and content standards.
Manages real-time streaming responses from language models, enabling smooth parsing, buffering, and event-driven handling of incremental AI outputs
Analyzes and refines typography, providing expert guidance on font selection, kerning, readability, and design consistency across digital and print media
Systematically builds comprehensive visual design databases by analyzing 500-1000 real-world examples across diverse domains, extracting actionable design patterns and trends.
Systematically creates, validates, and improves Agent Skills by encoding domain expertise and preventing incorrect activations.
Performs semantic image-text matching using CLIP embeddings for zero-shot classification, image search, and similarity tasks.
Analyze and optimize audio tracks by applying professional mixing techniques, EQ adjustments, and mastering effects for high-quality sound production