🎯

llm-streaming-response-handler

🎯Skill

from erichowens/some_claude_skills

VibeIndex|
What it does

Manages real-time streaming responses from language models, enabling smooth parsing, buffering, and event-driven handling of incremental AI outputs

llm-streaming-response-handler

Installation

Install skill:
npx skills add https://github.com/erichowens/some_claude_skills --skill llm-streaming-response-handler
16
AddedJan 25, 2026

Skill Details

SKILL.md

Build production LLM streaming UIs with Server-Sent Events, real-time token display, cancellation, error recovery. Handles OpenAI/Anthropic/Claude streaming APIs. Use for chatbots, AI assistants, real-time text generation. Activate on "LLM streaming", "SSE", "token stream", "chat UI", "real-time AI". NOT for batch processing, non-streaming APIs, or WebSocket bidirectional chat.

Overview

# LLM Streaming Response Handler

Expert in building production-grade streaming interfaces for LLM responses that feel instant and responsive.

When to Use

βœ… Use for:

  • Chat interfaces with typing animation
  • Real-time AI assistants
  • Code generation with live preview
  • Document summarization with progressive display
  • Any UI where users expect immediate feedback from LLMs

❌ NOT for:

  • Batch document processing (no user watching)
  • APIs that don't support streaming
  • WebSocket-based bidirectional chat (use Socket.IO)
  • Simple request/response (fetch is fine)

Quick Decision Tree

```

Does your LLM interaction:

β”œβ”€β”€ Need immediate visual feedback? β†’ Streaming

β”œβ”€β”€ Display long-form content (>100 words)? β†’ Streaming

β”œβ”€β”€ User expects typewriter effect? β†’ Streaming

β”œβ”€β”€ Short response (<50 words)? β†’ Regular fetch

└── Background processing? β†’ Regular fetch

```

---

Technology Selection

Server-Sent Events (SSE) - Recommended

Why SSE over WebSockets for LLM streaming:

  • Simplicity: HTTP-based, works with existing infrastructure
  • Auto-reconnect: Built-in reconnection logic
  • Firewall-friendly: Easier than WebSockets through proxies
  • One-way perfect: LLMs only stream server β†’ client

Timeline:

  • 2015-2020: WebSockets for everything
  • 2020: SSE adoption for streaming APIs
  • 2023+: SSE standard for LLM streaming (OpenAI, Anthropic)
  • 2024: Vercel AI SDK popularizes SSE patterns

Streaming APIs

| Provider | Streaming Method | Response Format |

|----------|------------------|-----------------|

| OpenAI | SSE | data: {"choices":[{"delta":{"content":"token"}}]} |

| Anthropic | SSE | data: {"type":"content_block_delta","delta":{"text":"token"}} |

| Claude (API) | SSE | data: {"delta":{"text":"token"}} |

| Vercel AI SDK | SSE | Normalized across providers |

---

Common Anti-Patterns

Anti-Pattern 1: Buffering Before Display

Novice thinking: "Collect all tokens, then show complete response"

Problem: Defeats the entire purpose of streaming.

Wrong approach:

```typescript

// ❌ Waits for entire response before showing anything

const response = await fetch('/api/chat', { method: 'POST', body: prompt });

const fullText = await response.text();

setMessage(fullText); // User sees nothing until done

```

Correct approach:

```typescript

// βœ… Display tokens as they arrive

const response = await fetch('/api/chat', {

method: 'POST',

body: JSON.stringify({ prompt })

});

const reader = response.body.getReader();

const decoder = new TextDecoder();

while (true) {

const { done, value } = await reader.read();

if (done) break;

const chunk = decoder.decode(value);

const lines = chunk.split('\n').filter(line => line.trim());

for (const line of lines) {

if (line.startsWith('data: ')) {

const data = JSON.parse(line.slice(6));

setMessage(prev => prev + data.content); // Update immediately

}

}

}

```

Timeline:

  • Pre-2023: Many apps buffered entire response
  • 2023+: Token-by-token display expected

---

Anti-Pattern 2: No Stream Cancellation

Problem: User can't stop generation, wasting tokens and money.

Symptom: "Stop" button doesn't work or doesn't exist.

Correct approach:

```typescript

// βœ… AbortController for cancellation

const [abortController, setAbortController] = useState(null);

const streamResponse = async () => {

const controller = new AbortController();

setAbortController(controller);

try {

const response = await fetch('/api/chat', {

signal: controller.signal,

method: 'POST',

body: JSON.stringify({ prompt })

});

// Stream handling...

} catch (error) {

if (error.name === 'AbortError') {

console.log('Stream cancelled by user');

}

} finally {

setAbortController(null);

}

};

const cancelStream = () => {

abortController?.abort();

};

return (

);

```

---

Anti-Pattern 3: No Error Recovery

Problem: Stream fails mid-response, user sees partial text with no indication of failure.

Correct approach:

```typescript

// βœ… Error states and recovery

const [streamState, setStreamState] = useState<'idle' | 'streaming' | 'error' | 'complete'>('idle');

const [errorMessage, setErrorMessage] = useState(null);

try {

setStreamState('streaming');

// Streaming logic...

setStreamState('complete');

} catch (error) {

setStreamState('error');

if (error.name === 'AbortError') {

setErrorMessage('Generation stopped');

} else if (error.message.includes('429')) {

setErrorMessage('Rate limit exceeded. Try again in a moment.');

} else {

setErrorMessage('Something went wrong. Please retry.');

}

}

// UI feedback

{streamState === 'error' && (

{errorMessage}

)}

```

---

Anti-Pattern 4: Memory Leaks from Unclosed Streams

Problem: Streams not cleaned up, causing memory leaks.

Symptom: Browser slows down after multiple requests.

Correct approach:

```typescript

// βœ… Cleanup with useEffect

useEffect(() => {

let reader: ReadableStreamDefaultReader | null = null;

const streamResponse = async () => {

const response = await fetch('/api/chat', { ... });

reader = response.body.getReader();

// Streaming...

};

streamResponse();

// Cleanup on unmount

return () => {

reader?.cancel();

};

}, [prompt]);

```

---

Anti-Pattern 5: No Typing Indicator Between Tokens

Problem: UI feels frozen between slow tokens.

Correct approach:

```typescript

// βœ… Animated cursor during generation

{content}

{isStreaming && β–Š}

```

```css

.typing-cursor {

animation: blink 1s step-end infinite;

}

@keyframes blink {

50% { opacity: 0; }

}

```

---

Implementation Patterns

Pattern 1: Basic SSE Stream Handler

```typescript

async function* streamCompletion(prompt: string) {

const response = await fetch('/api/chat', {

method: 'POST',

headers: { 'Content-Type': 'application/json' },

body: JSON.stringify({ prompt })

});

const reader = response.body!.getReader();

const decoder = new TextDecoder();

while (true) {

const { done, value } = await reader.read();

if (done) break;

const chunk = decoder.decode(value);

const lines = chunk.split('\n');

for (const line of lines) {

if (line.startsWith('data: ')) {

const data = JSON.parse(line.slice(6));

if (data.content) {

yield data.content;

}

if (data.done) {

return;

}

}

}

}

}

// Usage

for await (const token of streamCompletion('Hello')) {

console.log(token);

}

```

Pattern 2: React Hook for Streaming

```typescript

import { useState, useCallback } from 'react';

interface UseStreamingOptions {

onToken?: (token: string) => void;

onComplete?: (fullText: string) => void;

onError?: (error: Error) => void;

}

export function useStreaming(options: UseStreamingOptions = {}) {

const [content, setContent] = useState('');

const [isStreaming, setIsStreaming] = useState(false);

const [error, setError] = useState(null);

const [abortController, setAbortController] = useState(null);

const stream = useCallback(async (prompt: string) => {

const controller = new AbortController();

setAbortController(controller);

setIsStreaming(true);

setError(null);

setContent('');

try {

const response = await fetch('/api/chat', {

method: 'POST',

signal: controller.signal,

headers: { 'Content-Type': 'application/json' },

body: JSON.stringify({ prompt })

});

const reader = response.body!.getReader();

const decoder = new TextDecoder();

let accumulated = '';

while (true) {

const { done, value } = await reader.read();

if (done) break;

const chunk = decoder.decode(value);

const lines = chunk.split('\n').filter(line => line.trim());

for (const line of lines) {

if (line.startsWith('data: ')) {

const data = JSON.parse(line.slice(6));

if (data.content) {

accumulated += data.content;

setContent(accumulated);

options.onToken?.(data.content);

}

}

}

}

options.onComplete?.(accumulated);

} catch (err) {

if (err.name !== 'AbortError') {

setError(err as Error);

options.onError?.(err as Error);

}

} finally {

setIsStreaming(false);

setAbortController(null);

}

}, [options]);

const cancel = useCallback(() => {

abortController?.abort();

}, [abortController]);

return { content, isStreaming, error, stream, cancel };

}

// Usage in component

function ChatInterface() {

const { content, isStreaming, stream, cancel } = useStreaming({

onToken: (token) => console.log('New token:', token),

onComplete: (text) => console.log('Done:', text)

});

return (

{content}

{isStreaming && β–Š}

{isStreaming && }

);

}

```

Pattern 3: Server-Side Streaming (Next.js)

```typescript

// app/api/chat/route.ts

import { OpenAI } from 'openai';

export const runtime = 'edge'; // Required for streaming

export async function POST(req: Request) {

const { prompt } = await req.json();

const openai = new OpenAI({

apiKey: process.env.OPENAI_API_KEY

});

const stream = await openai.chat.completions.create({

model: 'gpt-4',

messages: [{ role: 'user', content: prompt }],

stream: true

});

// Convert OpenAI stream to SSE format

const encoder = new TextEncoder();

const readable = new ReadableStream({

async start(controller) {

try {

for await (const chunk of stream) {

const content = chunk.choices[0]?.delta?.content;

if (content) {

const sseMessage = data: ${JSON.stringify({ content })}\n\n;

controller.enqueue(encoder.encode(sseMessage));

}

}

// Send completion signal

controller.enqueue(encoder.encode('data: {"done":true}\n\n'));

controller.close();

} catch (error) {

controller.error(error);

}

}

});

return new Response(readable, {

headers: {

'Content-Type': 'text/event-stream',

'Cache-Control': 'no-cache',

'Connection': 'keep-alive'

}

});

}

```

---

Production Checklist

```

β–‘ AbortController for cancellation

β–‘ Error states with retry capability

β–‘ Typing indicator during generation

β–‘ Cleanup on component unmount

β–‘ Rate limiting on API route

β–‘ Token usage tracking

β–‘ Streaming fallback (if API fails)

β–‘ Accessibility (screen reader announces updates)

β–‘ Mobile-friendly (touch targets for stop button)

β–‘ Network error recovery (auto-retry on disconnect)

β–‘ Max response length enforcement

β–‘ Cost estimation before generation

```

---

When to Use vs Avoid

| Scenario | Use Streaming? |

|----------|----------------|

| Chat interface | βœ… Yes |

| Long-form content generation | βœ… Yes |

| Code generation with preview | βœ… Yes |

| Short completions (<50 words) | ❌ No - regular fetch |

| Background jobs | ❌ No - use job queue |

| Bidirectional chat | ⚠️ Use WebSockets instead |

---

Technology Comparison

| Feature | SSE | WebSockets | Long Polling |

|---------|-----|-----------|--------------|

| Complexity | Low | Medium | High |

| Auto-reconnect | βœ… | ❌ | ❌ |

| Bidirectional | ❌ | βœ… | ❌ |

| Firewall-friendly | βœ… | ⚠️ | βœ… |

| Browser support | βœ… All modern | βœ… All modern | βœ… Universal |

| LLM API support | βœ… Standard | ❌ Rare | ❌ Not used |

---

References

  • /references/sse-protocol.md - Server-Sent Events specification details
  • /references/vercel-ai-sdk.md - Vercel AI SDK integration patterns
  • /references/error-recovery.md - Stream error handling strategies

Scripts

  • scripts/stream_tester.ts - Test SSE endpoints locally
  • scripts/token_counter.ts - Estimate costs before generation

---

This skill guides: LLM streaming implementation | SSE protocol | Real-time UI updates | Cancellation | Error recovery | Token-by-token display

More from this repository10

🎯
ai-engineer🎯Skill

Builds production-ready LLM applications with advanced RAG, vector search, and intelligent agent architectures for enterprise AI solutions.

🎯
research-analyst🎯Skill

Conducts comprehensive market research, competitive analysis, and evidence-based strategy recommendations across diverse landscapes and industries.

🎯
color-theory-palette-harmony-expert🎯Skill

Generates harmonious color palettes using color theory principles, recommending complementary, analogous, and triadic color schemes for design projects.

🎯
skill-architect🎯Skill

Systematically creates, validates, and improves Agent Skills by encoding domain expertise and preventing incorrect activations.

🎯
design-archivist🎯Skill

Systematically builds comprehensive visual design databases by analyzing 500-1000 real-world examples across diverse domains, extracting actionable design patterns and trends.

🎯
typography-expert🎯Skill

Analyzes and refines typography, providing expert guidance on font selection, kerning, readability, and design consistency across digital and print media

🎯
clip-aware-embeddings🎯Skill

Performs semantic image-text matching using CLIP embeddings for zero-shot classification, image search, and similarity tasks.

🎯
dag-output-validator🎯Skill

Validates and enforces output quality by checking agent responses against predefined schemas, structural requirements, and content standards.

🎯
orchestrator🎯Skill

Intelligently coordinates multiple specialized skills, dynamically decomposes complex tasks, synthesizes outputs, and creates new skills to fill capability gaps.

🎯
sound-engineer🎯Skill

Analyze and optimize audio tracks by applying professional mixing techniques, EQ adjustments, and mastering effects for high-quality sound production