llm-streaming-response-handler
π―Skillfrom erichowens/some_claude_skills
Manages real-time streaming responses from language models, enabling smooth parsing, buffering, and event-driven handling of incremental AI outputs
Installation
npx skills add https://github.com/erichowens/some_claude_skills --skill llm-streaming-response-handlerSkill Details
Build production LLM streaming UIs with Server-Sent Events, real-time token display, cancellation, error recovery. Handles OpenAI/Anthropic/Claude streaming APIs. Use for chatbots, AI assistants, real-time text generation. Activate on "LLM streaming", "SSE", "token stream", "chat UI", "real-time AI". NOT for batch processing, non-streaming APIs, or WebSocket bidirectional chat.
Overview
# LLM Streaming Response Handler
Expert in building production-grade streaming interfaces for LLM responses that feel instant and responsive.
When to Use
β Use for:
- Chat interfaces with typing animation
- Real-time AI assistants
- Code generation with live preview
- Document summarization with progressive display
- Any UI where users expect immediate feedback from LLMs
β NOT for:
- Batch document processing (no user watching)
- APIs that don't support streaming
- WebSocket-based bidirectional chat (use Socket.IO)
- Simple request/response (fetch is fine)
Quick Decision Tree
```
Does your LLM interaction:
βββ Need immediate visual feedback? β Streaming
βββ Display long-form content (>100 words)? β Streaming
βββ User expects typewriter effect? β Streaming
βββ Short response (<50 words)? β Regular fetch
βββ Background processing? β Regular fetch
```
---
Technology Selection
Server-Sent Events (SSE) - Recommended
Why SSE over WebSockets for LLM streaming:
- Simplicity: HTTP-based, works with existing infrastructure
- Auto-reconnect: Built-in reconnection logic
- Firewall-friendly: Easier than WebSockets through proxies
- One-way perfect: LLMs only stream server β client
Timeline:
- 2015-2020: WebSockets for everything
- 2020: SSE adoption for streaming APIs
- 2023+: SSE standard for LLM streaming (OpenAI, Anthropic)
- 2024: Vercel AI SDK popularizes SSE patterns
Streaming APIs
| Provider | Streaming Method | Response Format |
|----------|------------------|-----------------|
| OpenAI | SSE | data: {"choices":[{"delta":{"content":"token"}}]} |
| Anthropic | SSE | data: {"type":"content_block_delta","delta":{"text":"token"}} |
| Claude (API) | SSE | data: {"delta":{"text":"token"}} |
| Vercel AI SDK | SSE | Normalized across providers |
---
Common Anti-Patterns
Anti-Pattern 1: Buffering Before Display
Novice thinking: "Collect all tokens, then show complete response"
Problem: Defeats the entire purpose of streaming.
Wrong approach:
```typescript
// β Waits for entire response before showing anything
const response = await fetch('/api/chat', { method: 'POST', body: prompt });
const fullText = await response.text();
setMessage(fullText); // User sees nothing until done
```
Correct approach:
```typescript
// β Display tokens as they arrive
const response = await fetch('/api/chat', {
method: 'POST',
body: JSON.stringify({ prompt })
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n').filter(line => line.trim());
for (const line of lines) {
if (line.startsWith('data: ')) {
const data = JSON.parse(line.slice(6));
setMessage(prev => prev + data.content); // Update immediately
}
}
}
```
Timeline:
- Pre-2023: Many apps buffered entire response
- 2023+: Token-by-token display expected
---
Anti-Pattern 2: No Stream Cancellation
Problem: User can't stop generation, wasting tokens and money.
Symptom: "Stop" button doesn't work or doesn't exist.
Correct approach:
```typescript
// β AbortController for cancellation
const [abortController, setAbortController] = useState
const streamResponse = async () => {
const controller = new AbortController();
setAbortController(controller);
try {
const response = await fetch('/api/chat', {
signal: controller.signal,
method: 'POST',
body: JSON.stringify({ prompt })
});
// Stream handling...
} catch (error) {
if (error.name === 'AbortError') {
console.log('Stream cancelled by user');
}
} finally {
setAbortController(null);
}
};
const cancelStream = () => {
abortController?.abort();
};
return (
Stop Generating
);
```
---
Anti-Pattern 3: No Error Recovery
Problem: Stream fails mid-response, user sees partial text with no indication of failure.
Correct approach:
```typescript
// β Error states and recovery
const [streamState, setStreamState] = useState<'idle' | 'streaming' | 'error' | 'complete'>('idle');
const [errorMessage, setErrorMessage] = useState
try {
setStreamState('streaming');
// Streaming logic...
setStreamState('complete');
} catch (error) {
setStreamState('error');
if (error.name === 'AbortError') {
setErrorMessage('Generation stopped');
} else if (error.message.includes('429')) {
setErrorMessage('Rate limit exceeded. Try again in a moment.');
} else {
setErrorMessage('Something went wrong. Please retry.');
}
}
// UI feedback
{streamState === 'error' && (
{errorMessage}
)}
```
---
Anti-Pattern 4: Memory Leaks from Unclosed Streams
Problem: Streams not cleaned up, causing memory leaks.
Symptom: Browser slows down after multiple requests.
Correct approach:
```typescript
// β Cleanup with useEffect
useEffect(() => {
let reader: ReadableStreamDefaultReader | null = null;
const streamResponse = async () => {
const response = await fetch('/api/chat', { ... });
reader = response.body.getReader();
// Streaming...
};
streamResponse();
// Cleanup on unmount
return () => {
reader?.cancel();
};
}, [prompt]);
```
---
Anti-Pattern 5: No Typing Indicator Between Tokens
Problem: UI feels frozen between slow tokens.
Correct approach:
```typescript
// β Animated cursor during generation
{content} {isStreaming && β}
```
```css
.typing-cursor {
animation: blink 1s step-end infinite;
}
@keyframes blink {
50% { opacity: 0; }
}
```
---
Implementation Patterns
Pattern 1: Basic SSE Stream Handler
```typescript
async function* streamCompletion(prompt: string) {
const response = await fetch('/api/chat', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ prompt })
});
const reader = response.body!.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n');
for (const line of lines) {
if (line.startsWith('data: ')) {
const data = JSON.parse(line.slice(6));
if (data.content) {
yield data.content;
}
if (data.done) {
return;
}
}
}
}
}
// Usage
for await (const token of streamCompletion('Hello')) {
console.log(token);
}
```
Pattern 2: React Hook for Streaming
```typescript
import { useState, useCallback } from 'react';
interface UseStreamingOptions {
onToken?: (token: string) => void;
onComplete?: (fullText: string) => void;
onError?: (error: Error) => void;
}
export function useStreaming(options: UseStreamingOptions = {}) {
const [content, setContent] = useState('');
const [isStreaming, setIsStreaming] = useState(false);
const [error, setError] = useState
const [abortController, setAbortController] = useState
const stream = useCallback(async (prompt: string) => {
const controller = new AbortController();
setAbortController(controller);
setIsStreaming(true);
setError(null);
setContent('');
try {
const response = await fetch('/api/chat', {
method: 'POST',
signal: controller.signal,
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ prompt })
});
const reader = response.body!.getReader();
const decoder = new TextDecoder();
let accumulated = '';
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n').filter(line => line.trim());
for (const line of lines) {
if (line.startsWith('data: ')) {
const data = JSON.parse(line.slice(6));
if (data.content) {
accumulated += data.content;
setContent(accumulated);
options.onToken?.(data.content);
}
}
}
}
options.onComplete?.(accumulated);
} catch (err) {
if (err.name !== 'AbortError') {
setError(err as Error);
options.onError?.(err as Error);
}
} finally {
setIsStreaming(false);
setAbortController(null);
}
}, [options]);
const cancel = useCallback(() => {
abortController?.abort();
}, [abortController]);
return { content, isStreaming, error, stream, cancel };
}
// Usage in component
function ChatInterface() {
const { content, isStreaming, stream, cancel } = useStreaming({
onToken: (token) => console.log('New token:', token),
onComplete: (text) => console.log('Done:', text)
});
return (
{content} {isStreaming && β} Generate {isStreaming && }
);
}
```
Pattern 3: Server-Side Streaming (Next.js)
```typescript
// app/api/chat/route.ts
import { OpenAI } from 'openai';
export const runtime = 'edge'; // Required for streaming
export async function POST(req: Request) {
const { prompt } = await req.json();
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY
});
const stream = await openai.chat.completions.create({
model: 'gpt-4',
messages: [{ role: 'user', content: prompt }],
stream: true
});
// Convert OpenAI stream to SSE format
const encoder = new TextEncoder();
const readable = new ReadableStream({
async start(controller) {
try {
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content;
if (content) {
const sseMessage = data: ${JSON.stringify({ content })}\n\n;
controller.enqueue(encoder.encode(sseMessage));
}
}
// Send completion signal
controller.enqueue(encoder.encode('data: {"done":true}\n\n'));
controller.close();
} catch (error) {
controller.error(error);
}
}
});
return new Response(readable, {
headers: {
'Content-Type': 'text/event-stream',
'Cache-Control': 'no-cache',
'Connection': 'keep-alive'
}
});
}
```
---
Production Checklist
```
β‘ AbortController for cancellation
β‘ Error states with retry capability
β‘ Typing indicator during generation
β‘ Cleanup on component unmount
β‘ Rate limiting on API route
β‘ Token usage tracking
β‘ Streaming fallback (if API fails)
β‘ Accessibility (screen reader announces updates)
β‘ Mobile-friendly (touch targets for stop button)
β‘ Network error recovery (auto-retry on disconnect)
β‘ Max response length enforcement
β‘ Cost estimation before generation
```
---
When to Use vs Avoid
| Scenario | Use Streaming? |
|----------|----------------|
| Chat interface | β Yes |
| Long-form content generation | β Yes |
| Code generation with preview | β Yes |
| Short completions (<50 words) | β No - regular fetch |
| Background jobs | β No - use job queue |
| Bidirectional chat | β οΈ Use WebSockets instead |
---
Technology Comparison
| Feature | SSE | WebSockets | Long Polling |
|---------|-----|-----------|--------------|
| Complexity | Low | Medium | High |
| Auto-reconnect | β | β | β |
| Bidirectional | β | β | β |
| Firewall-friendly | β | β οΈ | β |
| Browser support | β All modern | β All modern | β Universal |
| LLM API support | β Standard | β Rare | β Not used |
---
References
/references/sse-protocol.md- Server-Sent Events specification details/references/vercel-ai-sdk.md- Vercel AI SDK integration patterns/references/error-recovery.md- Stream error handling strategies
Scripts
scripts/stream_tester.ts- Test SSE endpoints locallyscripts/token_counter.ts- Estimate costs before generation
---
This skill guides: LLM streaming implementation | SSE protocol | Real-time UI updates | Cancellation | Error recovery | Token-by-token display
More from this repository10
Builds production-ready LLM applications with advanced RAG, vector search, and intelligent agent architectures for enterprise AI solutions.
Conducts comprehensive market research, competitive analysis, and evidence-based strategy recommendations across diverse landscapes and industries.
Generates harmonious color palettes using color theory principles, recommending complementary, analogous, and triadic color schemes for design projects.
Systematically creates, validates, and improves Agent Skills by encoding domain expertise and preventing incorrect activations.
Systematically builds comprehensive visual design databases by analyzing 500-1000 real-world examples across diverse domains, extracting actionable design patterns and trends.
Analyzes and refines typography, providing expert guidance on font selection, kerning, readability, and design consistency across digital and print media
Performs semantic image-text matching using CLIP embeddings for zero-shot classification, image search, and similarity tasks.
Validates and enforces output quality by checking agent responses against predefined schemas, structural requirements, and content standards.
Intelligently coordinates multiple specialized skills, dynamically decomposes complex tasks, synthesizes outputs, and creates new skills to fill capability gaps.
Analyze and optimize audio tracks by applying professional mixing techniques, EQ adjustments, and mastering effects for high-quality sound production