openai-api
π―Skillfrom ovachiever/droid-tings
Enables comprehensive interaction with OpenAI's APIs, supporting chat completions, embeddings, images, audio, and moderation with advanced features like streaming and function calling.
Part of
ovachiever/droid-tings(370 items)
Installation
git clone https://github.com/ovachiever/droid-tings.gitSkill Details
|
Overview
# OpenAI API - Complete Guide
Version: Production Ready β
Package: openai@6.7.0
Last Updated: 2025-10-25
---
Status
β Production Ready:
- β Chat Completions API (GPT-5, GPT-4o, GPT-4 Turbo)
- β Embeddings API (text-embedding-3-small, text-embedding-3-large)
- β Images API (DALL-E 3 generation + GPT-Image-1 editing)
- β Audio API (Whisper transcription + TTS with 11 voices)
- β Moderation API (11 safety categories)
- β Streaming patterns (SSE)
- β Function calling / Tools
- β Structured outputs (JSON schemas)
- β Vision (GPT-4o)
- β Both Node.js SDK and fetch approaches
---
Table of Contents
- [Quick Start](#quick-start)
- [Chat Completions API](#chat-completions-api)
- [GPT-5 Series Models](#gpt-5-series-models)
- [Streaming Patterns](#streaming-patterns)
- [Function Calling](#function-calling)
- [Structured Outputs](#structured-outputs)
- [Vision (GPT-4o)](#vision-gpt-4o)
- [Embeddings API](#embeddings-api)
- [Images API](#images-api)
- [Audio API](#audio-api)
- [Moderation API](#moderation-api)
- [Error Handling](#error-handling)
- [Rate Limits](#rate-limits)
- [Production Best Practices](#production-best-practices)
- [Relationship to openai-responses](#relationship-to-openai-responses)
---
Quick Start
Installation
```bash
npm install openai@6.7.0
```
Environment Setup
```bash
export OPENAI_API_KEY="sk-..."
```
Or create .env file:
```
OPENAI_API_KEY=sk-...
```
First Chat Completion (Node.js SDK)
```typescript
import OpenAI from 'openai';
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
});
const completion = await openai.chat.completions.create({
model: 'gpt-5',
messages: [
{ role: 'user', content: 'What are the three laws of robotics?' }
],
});
console.log(completion.choices[0].message.content);
```
First Chat Completion (Fetch - Cloudflare Workers)
```typescript
const response = await fetch('https://api.openai.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': Bearer ${env.OPENAI_API_KEY},
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'gpt-5',
messages: [
{ role: 'user', content: 'What are the three laws of robotics?' }
],
}),
});
const data = await response.json();
console.log(data.choices[0].message.content);
```
---
Chat Completions API
Endpoint: POST /v1/chat/completions
The Chat Completions API is the core interface for interacting with OpenAI's language models. It supports conversational AI, text generation, function calling, structured outputs, and vision capabilities.
Supported Models
#### GPT-5 Series (Released August 2025)
- gpt-5: Full-featured reasoning model with advanced capabilities
- gpt-5-mini: Cost-effective alternative with good performance
- gpt-5-nano: Smallest/fastest variant for simple tasks
#### GPT-4o Series
- gpt-4o: Multimodal model with vision capabilities
- gpt-4-turbo: Fast GPT-4 variant
#### GPT-4 Series
- gpt-4: Original GPT-4 model
Basic Request Structure
```typescript
{
model: string, // Model to use (e.g., "gpt-5")
messages: Message[], // Conversation history
reasoning_effort?: string, // GPT-5 only: "minimal" | "low" | "medium" | "high"
verbosity?: string, // GPT-5 only: "low" | "medium" | "high"
temperature?: number, // NOT supported by GPT-5
max_tokens?: number, // Max tokens to generate
stream?: boolean, // Enable streaming
tools?: Tool[], // Function calling tools
}
```
Response Structure
```typescript
{
id: string, // Unique completion ID
object: "chat.completion",
created: number, // Unix timestamp
model: string, // Model used
choices: [{
index: number,
message: {
role: "assistant",
content: string, // Generated text
tool_calls?: ToolCall[] // If function calling
},
finish_reason: string // "stop" | "length" | "tool_calls"
}],
usage: {
prompt_tokens: number,
completion_tokens: number,
total_tokens: number
}
}
```
Message Roles
OpenAI supports three message roles:
- system (formerly "developer"): Set behavior and context
- user: User input
- assistant: Model responses
```typescript
const messages = [
{
role: 'system',
content: 'You are a helpful assistant that explains complex topics simply.'
},
{
role: 'user',
content: 'Explain quantum computing to a 10-year-old.'
}
];
```
Multi-turn Conversations
Build conversation history by appending messages:
```typescript
const messages = [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'What is TypeScript?' },
{ role: 'assistant', content: 'TypeScript is a superset of JavaScript...' },
{ role: 'user', content: 'How do I install it?' }
];
const completion = await openai.chat.completions.create({
model: 'gpt-5',
messages: messages,
});
```
Important: Chat Completions API is stateless. You must send full conversation history with each request. For stateful conversations, use the openai-responses skill.
---
GPT-5 Series Models
GPT-5 models (released August 2025) introduce new parameters and capabilities:
Unique GPT-5 Parameters
#### reasoning_effort
Controls the depth of reasoning:
- "minimal": Quick responses, less reasoning
- "low": Basic reasoning
- "medium": Balanced reasoning (default)
- "high": Deep reasoning for complex problems
```typescript
const completion = await openai.chat.completions.create({
model: 'gpt-5',
messages: [{ role: 'user', content: 'Solve this complex math problem...' }],
reasoning_effort: 'high', // Deep reasoning
});
```
#### verbosity
Controls output length and detail:
- "low": Concise responses
- "medium": Balanced detail (default)
- "high": Verbose, detailed responses
```typescript
const completion = await openai.chat.completions.create({
model: 'gpt-5',
messages: [{ role: 'user', content: 'Explain quantum mechanics' }],
verbosity: 'high', // Detailed explanation
});
```
GPT-5 Limitations
NOT Supported with GPT-5:
- β
temperatureparameter - β
top_pparameter - β
logprobsparameter - β Chain of Thought (CoT) persistence between turns
If you need these features:
- Use GPT-4o or GPT-4 Turbo for temperature/top_p/logprobs
- Use
openai-responsesskill for stateful CoT preservation
GPT-5 vs GPT-4o Comparison
| Feature | GPT-5 | GPT-4o |
|---------|-------|--------|
| Reasoning control | β reasoning_effort | β |
| Verbosity control | β verbosity | β |
| Temperature | β | β |
| Top-p | β | β |
| Vision | β | β |
| Function calling | β | β |
| Streaming | β | β |
When to use GPT-5: Complex reasoning tasks, mathematical problems, logic puzzles, code generation
When to use GPT-4o: Vision tasks, when you need temperature control, multimodal inputs
---
Streaming Patterns
Streaming allows real-time token-by-token delivery, improving perceived latency for long responses.
Enable Streaming
Set stream: true:
```typescript
const stream = await openai.chat.completions.create({
model: 'gpt-5',
messages: [{ role: 'user', content: 'Tell me a story' }],
stream: true,
});
```
Streaming with Node.js SDK
```typescript
import OpenAI from 'openai';
const openai = new OpenAI();
const stream = await openai.chat.completions.create({
model: 'gpt-5',
messages: [{ role: 'user', content: 'Write a poem about coding' }],
stream: true,
});
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content || '';
process.stdout.write(content);
}
```
Streaming with Fetch (Cloudflare Workers)
```typescript
const response = await fetch('https://api.openai.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': Bearer ${env.OPENAI_API_KEY},
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'gpt-5',
messages: [{ role: 'user', content: 'Write a poem' }],
stream: true,
}),
});
const reader = response.body?.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader!.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n').filter(line => line.trim() !== '');
for (const line of lines) {
if (line.startsWith('data: ')) {
const data = line.slice(6);
if (data === '[DONE]') break;
try {
const json = JSON.parse(data);
const content = json.choices[0]?.delta?.content || '';
console.log(content);
} catch (e) {
// Skip invalid JSON
}
}
}
}
```
Server-Sent Events (SSE) Format
Streaming uses Server-Sent Events:
```
data: {"id":"chatcmpl-xyz","choices":[{"delta":{"role":"assistant"}}]}
data: {"id":"chatcmpl-xyz","choices":[{"delta":{"content":"Hello"}}]}
data: {"id":"chatcmpl-xyz","choices":[{"delta":{"content":" world"}}]}
data: {"id":"chatcmpl-xyz","choices":[{"finish_reason":"stop"}]}
data: [DONE]
```
Streaming Best Practices
β Always handle:
- Incomplete chunks (buffer partial data)
[DONE]signal- Network errors and retries
- Invalid JSON (skip gracefully)
β Performance:
- Use streaming for responses >100 tokens
- Don't stream if you need the full response before processing
β Don't:
- Assume chunks are always complete JSON
- Forget to close the stream on errors
- Buffer entire response in memory (defeats streaming purpose)
---
Function Calling
Function calling (also called "tool calling") allows models to invoke external functions/tools based on conversation context.
Basic Tool Definition
```typescript
const tools = [
{
type: 'function',
function: {
name: 'get_weather',
description: 'Get the current weather for a location',
parameters: {
type: 'object',
properties: {
location: {
type: 'string',
description: 'City name, e.g., San Francisco'
},
unit: {
type: 'string',
enum: ['celsius', 'fahrenheit'],
description: 'Temperature unit'
}
},
required: ['location']
}
}
}
];
```
Making a Request with Tools
```typescript
const completion = await openai.chat.completions.create({
model: 'gpt-5',
messages: [
{ role: 'user', content: 'What is the weather in San Francisco?' }
],
tools: tools,
});
```
Handling Tool Calls
```typescript
const message = completion.choices[0].message;
if (message.tool_calls) {
// Model wants to call a function
for (const toolCall of message.tool_calls) {
if (toolCall.function.name === 'get_weather') {
const args = JSON.parse(toolCall.function.arguments);
// Execute your function
const weatherData = await getWeather(args.location, args.unit);
// Send result back to model
const followUp = await openai.chat.completions.create({
model: 'gpt-5',
messages: [
...messages,
message, // Assistant's tool call
{
role: 'tool',
tool_call_id: toolCall.id,
content: JSON.stringify(weatherData)
}
],
tools: tools,
});
}
}
}
```
Complete Function Calling Flow
```typescript
async function chatWithTools(userMessage: string) {
let messages = [
{ role: 'user', content: userMessage }
];
while (true) {
const completion = await openai.chat.completions.create({
model: 'gpt-5',
messages: messages,
tools: tools,
});
const message = completion.choices[0].message;
messages.push(message);
// If no tool calls, we're done
if (!message.tool_calls) {
return message.content;
}
// Execute all tool calls
for (const toolCall of message.tool_calls) {
const result = await executeFunction(toolCall.function.name, toolCall.function.arguments);
messages.push({
role: 'tool',
tool_call_id: toolCall.id,
content: JSON.stringify(result)
});
}
}
}
```
Multiple Tools
You can define multiple tools:
```typescript
const tools = [
{
type: 'function',
function: {
name: 'get_weather',
description: 'Get weather for a location',
parameters: { / schema / }
}
},
{
type: 'function',
function: {
name: 'search_web',
description: 'Search the web',
parameters: { / schema / }
}
},
{
type: 'function',
function: {
name: 'calculate',
description: 'Perform calculations',
parameters: { / schema / }
}
}
];
```
The model will choose which tool(s) to call based on the conversation.
---
Structured Outputs
Structured outputs allow you to enforce JSON schema validation on model responses.
Using JSON Schema
```typescript
const completion = await openai.chat.completions.create({
model: 'gpt-4o', // Note: Structured outputs best supported on GPT-4o
messages: [
{ role: 'user', content: 'Generate a person profile' }
],
response_format: {
type: 'json_schema',
json_schema: {
name: 'person_profile',
strict: true,
schema: {
type: 'object',
properties: {
name: { type: 'string' },
age: { type: 'number' },
skills: {
type: 'array',
items: { type: 'string' }
}
},
required: ['name', 'age', 'skills'],
additionalProperties: false
}
}
}
});
const person = JSON.parse(completion.choices[0].message.content);
// { name: "Alice", age: 28, skills: ["TypeScript", "React"] }
```
JSON Mode (Simple)
For simpler use cases without strict schema validation:
```typescript
const completion = await openai.chat.completions.create({
model: 'gpt-5',
messages: [
{ role: 'user', content: 'List 3 programming languages as JSON' }
],
response_format: { type: 'json_object' }
});
const data = JSON.parse(completion.choices[0].message.content);
```
Important: When using response_format, include "JSON" in your prompt to guide the model.
---
Vision (GPT-4o)
GPT-4o supports image understanding alongside text.
Image via URL
```typescript
const completion = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [
{
role: 'user',
content: [
{ type: 'text', text: 'What is in this image?' },
{
type: 'image_url',
image_url: {
url: 'https://example.com/image.jpg'
}
}
]
}
]
});
```
Image via Base64
```typescript
import fs from 'fs';
const imageBuffer = fs.readFileSync('./image.jpg');
const base64Image = imageBuffer.toString('base64');
const completion = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [
{
role: 'user',
content: [
{ type: 'text', text: 'Describe this image in detail' },
{
type: 'image_url',
image_url: {
url: data:image/jpeg;base64,${base64Image}
}
}
]
}
]
});
```
Multiple Images
```typescript
const completion = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [
{
role: 'user',
content: [
{ type: 'text', text: 'Compare these two images' },
{ type: 'image_url', image_url: { url: 'https://example.com/image1.jpg' } },
{ type: 'image_url', image_url: { url: 'https://example.com/image2.jpg' } }
]
}
]
});
```
---
Embeddings API
Endpoint: POST /v1/embeddings
Embeddings convert text into high-dimensional vectors for semantic search, clustering, recommendations, and retrieval-augmented generation (RAG).
Supported Models
#### text-embedding-3-large
- Default dimensions: 3072
- Custom dimensions: 256-3072
- Best for: Highest quality semantic understanding
- Use case: Production RAG, advanced semantic search
#### text-embedding-3-small
- Default dimensions: 1536
- Custom dimensions: 256-1536
- Best for: Cost-effective embeddings
- Use case: Most applications, high-volume processing
#### text-embedding-ada-002 (Legacy)
- Dimensions: 1536 (fixed)
- Status: Still supported, use v3 models for new projects
Basic Request (Node.js SDK)
```typescript
import OpenAI from 'openai';
const openai = new OpenAI();
const embedding = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: 'The food was delicious and the waiter was friendly.',
});
console.log(embedding.data[0].embedding);
// [0.0023064255, -0.009327292, ..., -0.0028842222]
```
Basic Request (Fetch - Cloudflare Workers)
```typescript
const response = await fetch('https://api.openai.com/v1/embeddings', {
method: 'POST',
headers: {
'Authorization': Bearer ${env.OPENAI_API_KEY},
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'text-embedding-3-small',
input: 'The food was delicious and the waiter was friendly.',
}),
});
const data = await response.json();
const embedding = data.data[0].embedding;
```
Response Structure
```typescript
{
object: "list",
data: [
{
object: "embedding",
embedding: [0.0023064255, -0.009327292, ...], // Array of floats
index: 0
}
],
model: "text-embedding-3-small",
usage: {
prompt_tokens: 8,
total_tokens: 8
}
}
```
Custom Dimensions
Control embedding dimensions to reduce storage/processing:
```typescript
const embedding = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: 'Sample text',
dimensions: 256, // Reduced from 1536 default
});
```
Supported ranges:
text-embedding-3-large: 256-3072text-embedding-3-small: 256-1536
Benefits:
- Smaller storage (4x-12x reduction)
- Faster similarity search
- Lower memory usage
- Minimal quality loss for many use cases
Batch Processing
Process multiple texts in a single request:
```typescript
const embeddings = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: [
'First document text',
'Second document text',
'Third document text',
],
});
// Access individual embeddings
embeddings.data.forEach((item, index) => {
console.log(Embedding ${index}:, item.embedding);
});
```
Limits:
- Max tokens per input: 8192
- Max summed tokens across all inputs: 300,000
- Array dimension max: 2048
Dimension Reduction Pattern
Post-generation truncation (alternative to dimensions parameter):
```typescript
// Get full embedding
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: 'Testing 123',
});
// Truncate to desired dimensions
const fullEmbedding = response.data[0].embedding;
const truncated = fullEmbedding.slice(0, 256);
// Normalize (L2)
function normalizeL2(vector: number[]): number[] {
const magnitude = Math.sqrt(vector.reduce((sum, val) => sum + val * val, 0));
return vector.map(val => val / magnitude);
}
const normalized = normalizeL2(truncated);
```
RAG Integration Pattern
Complete retrieval-augmented generation workflow:
```typescript
import OpenAI from 'openai';
const openai = new OpenAI();
// 1. Generate embeddings for knowledge base
async function embedKnowledgeBase(documents: string[]) {
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: documents,
});
return response.data.map(item => item.embedding);
}
// 2. Embed user query
async function embedQuery(query: string) {
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: query,
});
return response.data[0].embedding;
}
// 3. Cosine similarity
function cosineSimilarity(a: number[], b: number[]): number {
const dotProduct = a.reduce((sum, val, i) => sum + val * b[i], 0);
const magnitudeA = Math.sqrt(a.reduce((sum, val) => sum + val * val, 0));
const magnitudeB = Math.sqrt(b.reduce((sum, val) => sum + val * val, 0));
return dotProduct / (magnitudeA * magnitudeB);
}
// 4. Find most similar documents
async function findSimilar(query: string, knowledgeBase: { text: string, embedding: number[] }[]) {
const queryEmbedding = await embedQuery(query);
const results = knowledgeBase.map(doc => ({
text: doc.text,
similarity: cosineSimilarity(queryEmbedding, doc.embedding),
}));
return results.sort((a, b) => b.similarity - a.similarity);
}
// 5. RAG: Retrieve + Generate
async function rag(query: string, knowledgeBase: { text: string, embedding: number[] }[]) {
const similarDocs = await findSimilar(query, knowledgeBase);
const context = similarDocs.slice(0, 3).map(d => d.text).join('\n\n');
const completion = await openai.chat.completions.create({
model: 'gpt-5',
messages: [
{
role: 'system',
content: Answer questions using the following context:\n\n${context}
},
{
role: 'user',
content: query
}
],
});
return completion.choices[0].message.content;
}
```
Embeddings Best Practices
β Model Selection:
- Use
text-embedding-3-smallfor most applications (1536 dims, cost-effective) - Use
text-embedding-3-largefor highest quality (3072 dims)
β Performance:
- Batch embed up to 2048 documents per request
- Use custom dimensions (256-512) for storage/speed optimization
- Cache embeddings (they're deterministic for same input)
β Accuracy:
- Normalize embeddings before storing (L2 normalization)
- Use cosine similarity for comparison
- Preprocess text consistently (lowercasing, removing special chars)
β Don't:
- Exceed 8192 tokens per input (will error)
- Sum >300k tokens across batch (will error)
- Mix models (incompatible dimensions)
- Forget to normalize when using truncated embeddings
---
Images API
OpenAI's Images API supports image generation with DALL-E 3 and image editing with GPT-Image-1.
Image Generation (DALL-E 3)
Endpoint: POST /v1/images/generations
Generate images from text prompts using DALL-E 3.
#### Basic Request (Node.js SDK)
```typescript
import OpenAI from 'openai';
const openai = new OpenAI();
const image = await openai.images.generate({
model: 'dall-e-3',
prompt: 'A white siamese cat with striking blue eyes',
size: '1024x1024',
quality: 'standard',
style: 'vivid',
n: 1,
});
console.log(image.data[0].url);
console.log(image.data[0].revised_prompt);
```
#### Basic Request (Fetch - Cloudflare Workers)
```typescript
const response = await fetch('https://api.openai.com/v1/images/generations', {
method: 'POST',
headers: {
'Authorization': Bearer ${env.OPENAI_API_KEY},
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'dall-e-3',
prompt: 'A white siamese cat with striking blue eyes',
size: '1024x1024',
quality: 'standard',
style: 'vivid',
}),
});
const data = await response.json();
const imageUrl = data.data[0].url;
```
#### Parameters
size - Image dimensions:
"1024x1024"(square)"1024x1536"(portrait)"1536x1024"(landscape)"1024x1792"(tall portrait)"1792x1024"(wide landscape)
quality - Rendering quality:
"standard": Normal quality, faster, cheaper"hd": High definition with finer details, costs more
style - Visual style:
"vivid": Hyper-real, dramatic, high-contrast images"natural": More natural, less dramatic styling
response_format - Output format:
"url": Returns temporary URL (expires in 1 hour)"b64_json": Returns base64-encoded image data
n - Number of images:
- DALL-E 3 only supports
n: 1 - DALL-E 2 supports
n: 1-10
#### Response Structure
```typescript
{
created: 1700000000,
data: [
{
url: "https://oaidalleapiprodscus.blob.core.windows.net/...",
revised_prompt: "A pristine white Siamese cat with striking blue eyes, sitting elegantly..."
}
]
}
```
Note: DALL-E 3 may revise your prompt for safety/quality. The revised_prompt field shows what was actually used.
#### Quality Comparison
```typescript
// Standard quality (faster, cheaper)
const standardImage = await openai.images.generate({
model: 'dall-e-3',
prompt: 'A futuristic city at sunset',
quality: 'standard',
});
// HD quality (finer details, costs more)
const hdImage = await openai.images.generate({
model: 'dall-e-3',
prompt: 'A futuristic city at sunset',
quality: 'hd',
});
```
#### Style Comparison
```typescript
// Vivid style (hyper-real, dramatic)
const vividImage = await openai.images.generate({
model: 'dall-e-3',
prompt: 'A mountain landscape',
style: 'vivid',
});
// Natural style (more realistic, less dramatic)
const naturalImage = await openai.images.generate({
model: 'dall-e-3',
prompt: 'A mountain landscape',
style: 'natural',
});
```
#### Base64 Output
```typescript
const image = await openai.images.generate({
model: 'dall-e-3',
prompt: 'A cyberpunk street scene',
response_format: 'b64_json',
});
const base64Data = image.data[0].b64_json;
// Convert to buffer and save
import fs from 'fs';
const buffer = Buffer.from(base64Data, 'base64');
fs.writeFileSync('image.png', buffer);
```
Image Editing (GPT-Image-1)
Endpoint: POST /v1/images/edits
Edit or composite images using AI.
Important: This endpoint uses multipart/form-data, not JSON.
#### Basic Edit Request
```typescript
import fs from 'fs';
import FormData from 'form-data';
const formData = new FormData();
formData.append('model', 'gpt-image-1');
formData.append('image', fs.createReadStream('./woman.jpg'));
formData.append('image_2', fs.createReadStream('./logo.png'));
formData.append('prompt', 'Add the logo to the woman\'s top, as if stamped into the fabric.');
formData.append('input_fidelity', 'high');
formData.append('size', '1024x1024');
formData.append('quality', 'auto');
const response = await fetch('https://api.openai.com/v1/images/edits', {
method: 'POST',
headers: {
'Authorization': Bearer ${process.env.OPENAI_API_KEY},
...formData.getHeaders(),
},
body: formData,
});
const data = await response.json();
const editedImageUrl = data.data[0].url;
```
#### Edit Parameters
model: "gpt-image-1" (required)
image: Primary image file (PNG, JPEG, WebP)
image_2: Secondary image for compositing (optional)
prompt: Text description of desired edits
input_fidelity:
"low": More creative freedom"medium": Balance"high": Stay closer to original
size: Same options as generation
quality:
"auto": Automatic quality selection"standard": Normal quality"high": Higher quality
format: Output format:
"png": PNG (supports transparency)"jpeg": JPEG (no transparency)"webp": WebP (smaller file size)
background: Background handling:
"transparent": Transparent background (PNG/WebP only)"white": White background"black": Black background
output_compression: JPEG/WebP compression (0-100)
0: Maximum compression (smallest file)100: Minimum compression (highest quality)
#### Transparent Background Example
```typescript
const formData = new FormData();
formData.append('model', 'gpt-image-1');
formData.append('image', fs.createReadStream('./product.jpg'));
formData.append('prompt', 'Remove the background, keeping only the product.');
formData.append('format', 'png');
formData.append('background', 'transparent');
const response = await fetch('https://api.openai.com/v1/images/edits', {
method: 'POST',
headers: {
'Authorization': Bearer ${process.env.OPENAI_API_KEY},
...formData.getHeaders(),
},
body: formData,
});
```
Images Best Practices
β Prompting:
- Be specific about details (colors, composition, style)
- Include artistic style references ("oil painting", "photograph", "3D render")
- Specify lighting ("golden hour", "studio lighting", "dramatic shadows")
- DALL-E 3 may revise prompts; check
revised_prompt
β Performance:
- Use
"standard"quality unless HD details are critical - Use
"natural"style for realistic images - Use
"vivid"style for marketing/artistic images - Cache generated images (they're non-deterministic)
β Cost Optimization:
- Standard quality is cheaper than HD
- Smaller sizes cost less
- Use appropriate size for your use case (don't generate 1792x1024 if you need 512x512)
β Don't:
- Request multiple images with DALL-E 3 (n=1 only)
- Expect deterministic output (same prompt = different images)
- Use URLs that expire (save images if needed long-term)
- Forget to handle revised prompts (DALL-E 3 modifies for safety)
---
Audio API
OpenAI's Audio API provides speech-to-text (Whisper) and text-to-speech (TTS) capabilities.
Whisper Transcription
Endpoint: POST /v1/audio/transcriptions
Convert audio to text using Whisper.
#### Supported Audio Formats
- mp3
- mp4
- mpeg
- mpga
- m4a
- wav
- webm
#### Basic Transcription (Node.js SDK)
```typescript
import OpenAI from 'openai';
import fs from 'fs';
const openai = new OpenAI();
const transcription = await openai.audio.transcriptions.create({
file: fs.createReadStream('./audio.mp3'),
model: 'whisper-1',
});
console.log(transcription.text);
```
#### Basic Transcription (Fetch)
```typescript
import fs from 'fs';
import FormData from 'form-data';
const formData = new FormData();
formData.append('file', fs.createReadStream('./audio.mp3'));
formData.append('model', 'whisper-1');
const response = await fetch('https://api.openai.com/v1/audio/transcriptions', {
method: 'POST',
headers: {
'Authorization': Bearer ${process.env.OPENAI_API_KEY},
...formData.getHeaders(),
},
body: formData,
});
const data = await response.json();
console.log(data.text);
```
#### Response Structure
```typescript
{
text: "Hello, this is a transcription of the audio file."
}
```
Text-to-Speech (TTS)
Endpoint: POST /v1/audio/speech
Convert text to natural-sounding speech.
#### Supported Models
tts-1
- Standard quality
- Optimized for real-time streaming
- Lowest latency
tts-1-hd
- High definition quality
- Better audio fidelity
- Slightly higher latency
gpt-4o-mini-tts
- Latest model (November 2024)
- Supports voice instructions
- Best quality and control
#### Available Voices (11 total)
- alloy: Neutral, balanced voice
- ash: Clear, professional voice
- ballad: Warm, storytelling voice
- coral: Soft, friendly voice
- echo: Calm, measured voice
- fable: Expressive, narrative voice
- onyx: Deep, authoritative voice
- nova: Bright, energetic voice
- sage: Wise, thoughtful voice
- shimmer: Gentle, soothing voice
- verse: Poetic, rhythmic voice
#### Basic TTS (Node.js SDK)
```typescript
import OpenAI from 'openai';
import fs from 'fs';
const openai = new OpenAI();
const mp3 = await openai.audio.speech.create({
model: 'tts-1',
voice: 'alloy',
input: 'The quick brown fox jumped over the lazy dog.',
});
const buffer = Buffer.from(await mp3.arrayBuffer());
fs.writeFileSync('speech.mp3', buffer);
```
#### Basic TTS (Fetch)
```typescript
const response = await fetch('https://api.openai.com/v1/audio/speech', {
method: 'POST',
headers: {
'Authorization': Bearer ${process.env.OPENAI_API_KEY},
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'tts-1',
voice: 'alloy',
input: 'The quick brown fox jumped over the lazy dog.',
}),
});
const audioBuffer = await response.arrayBuffer();
// Save or stream the audio
```
#### TTS Parameters
input: Text to convert to speech (max 4096 characters)
voice: One of 11 voices (alloy, ash, ballad, coral, echo, fable, onyx, nova, sage, shimmer, verse)
model: "tts-1" | "tts-1-hd" | "gpt-4o-mini-tts"
instructions: Voice control instructions (gpt-4o-mini-tts only)
- Not supported by tts-1 or tts-1-hd
- Examples: "Speak in a calm, soothing tone", "Use a professional business voice"
response_format: Output audio format
- "mp3" (default)
- "opus"
- "aac"
- "flac"
- "wav"
- "pcm"
speed: Playback speed (0.25 to 4.0, default 1.0)
- 0.25 = quarter speed (very slow)
- 1.0 = normal speed
- 2.0 = double speed
- 4.0 = quadruple speed (very fast)
#### Voice Instructions (gpt-4o-mini-tts)
```typescript
const speech = await openai.audio.speech.create({
model: 'gpt-4o-mini-tts',
voice: 'nova',
input: 'Welcome to our customer support line.',
instructions: 'Speak in a calm, professional, and friendly tone suitable for customer service.',
});
```
Instruction Examples:
- "Speak slowly and clearly for educational content"
- "Use an enthusiastic, energetic tone for marketing"
- "Adopt a calm, soothing voice for meditation guidance"
- "Sound authoritative and confident for presentations"
#### Speed Control
```typescript
// Slow speech (0.5x speed)
const slowSpeech = await openai.audio.speech.create({
model: 'tts-1',
voice: 'alloy',
input: 'This will be spoken slowly.',
speed: 0.5,
});
// Fast speech (1.5x speed)
const fastSpeech = await openai.audio.speech.create({
model: 'tts-1',
voice: 'alloy',
input: 'This will be spoken quickly.',
speed: 1.5,
});
```
#### Different Audio Formats
```typescript
// MP3 (most compatible, default)
const mp3 = await openai.audio.speech.create({
model: 'tts-1',
voice: 'alloy',
input: 'Hello',
response_format: 'mp3',
});
// Opus (best for web streaming)
const opus = await openai.audio.speech.create({
model: 'tts-1',
voice: 'alloy',
input: 'Hello',
response_format: 'opus',
});
// WAV (uncompressed, highest quality)
const wav = await openai.audio.speech.create({
model: 'tts-1',
voice: 'alloy',
input: 'Hello',
response_format: 'wav',
});
```
#### Streaming TTS (Server-Sent Events)
```typescript
const response = await fetch('https://api.openai.com/v1/audio/speech', {
method: 'POST',
headers: {
'Authorization': Bearer ${process.env.OPENAI_API_KEY},
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'gpt-4o-mini-tts',
voice: 'nova',
input: 'Long text to be streamed as audio chunks...',
stream_format: 'sse', // Server-Sent Events
}),
});
// Stream audio chunks
const reader = response.body?.getReader();
while (true) {
const { done, value } = await reader!.read();
if (done) break;
// Process audio chunk
processAudioChunk(value);
}
```
Note: SSE streaming (stream_format: "sse") is only supported by gpt-4o-mini-tts. tts-1 and tts-1-hd do not support streaming.
Audio Best Practices
β Transcription:
- Use supported formats (mp3, wav, m4a)
- Ensure clear audio quality
- Whisper handles multiple languages automatically
- Works best with clean audio (minimal background noise)
β Text-to-Speech:
- Use
tts-1for real-time/streaming (lowest latency) - Use
tts-1-hdfor higher quality offline audio - Use
gpt-4o-mini-ttsfor voice instructions and streaming - Choose voice based on use case (alloy for neutral, onyx for authoritative, etc.)
- Test different voices to find best fit
- Use instructions (gpt-4o-mini-tts) for fine-grained control
β Performance:
- Cache generated audio (deterministic for same input)
- Use opus format for web streaming (smaller file size)
- Use mp3 for maximum compatibility
- Stream audio with
stream_format: "sse"for real-time playback
β Don't:
- Exceed 4096 characters for TTS input
- Use instructions with tts-1 or tts-1-hd (not supported)
- Use streaming with tts-1/tts-1-hd (use gpt-4o-mini-tts)
- Assume transcription is perfect (always review important content)
---
Moderation API
Endpoint: POST /v1/moderations
Check content for policy violations across 11 safety categories.
Basic Moderation (Node.js SDK)
```typescript
import OpenAI from 'openai';
const openai = new OpenAI();
const moderation = await openai.moderations.create({
model: 'omni-moderation-latest',
input: 'I want to hurt someone.',
});
console.log(moderation.results[0].flagged);
console.log(moderation.results[0].categories);
console.log(moderation.results[0].category_scores);
```
Basic Moderation (Fetch)
```typescript
const response = await fetch('https://api.openai.com/v1/moderations', {
method: 'POST',
headers: {
'Authorization': Bearer ${env.OPENAI_API_KEY},
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'omni-moderation-latest',
input: 'I want to hurt someone.',
}),
});
const data = await response.json();
const isFlagged = data.results[0].flagged;
```
Response Structure
```typescript
{
id: "modr-ABC123",
model: "omni-moderation-latest",
results: [
{
flagged: true,
categories: {
sexual: false,
hate: false,
harassment: true,
"self-harm": false,
"sexual/minors": false,
"hate/threatening": false,
"violence/graphic": false,
"self-harm/intent": false,
"self-harm/instructions": false,
"harassment/threatening": true,
violence: true
},
category_scores: {
sexual: 0.000011726,
hate: 0.2270666,
harassment: 0.5215635,
"self-harm": 0.0000123,
"sexual/minors": 0.0000001,
"hate/threatening": 0.0123456,
"violence/graphic": 0.0123456,
"self-harm/intent": 0.0000123,
"self-harm/instructions": 0.0000123,
"harassment/threatening": 0.4123456,
violence: 0.9971135
}
}
]
}
```
Safety Categories (11 total)
sexual: Sexual content
- Erotic or pornographic material
- Sexual services
hate: Hateful content
- Content promoting hate based on identity
- Dehumanizing language
harassment: Harassing content
- Bullying or intimidation
- Personal attacks
self-harm: Self-harm content
- Promoting or encouraging self-harm
- Suicide-related content
sexual/minors: Sexual content involving minors
- Any sexualization of children
- Child abuse material (CSAM)
hate/threatening: Hateful + threatening
- Violent threats based on identity
- Calls for violence against protected groups
violence/graphic: Graphic violence
- Extreme gore or violence
- Graphic injury descriptions
self-harm/intent: Self-harm intent
- Active expressions of suicidal ideation
- Plans to self-harm
self-harm/instructions: Self-harm instructions
- How-to guides for self-harm
- Methods for suicide
harassment/threatening: Harassment + threats
- Violent threats toward individuals
- Credible harm threats
violence: Violent content
- Threats of violence
- Glorification of violence
- Instructions for violence
Category Scores
Scores range from 0 to 1:
- 0.0: Very low confidence
- 0.5: Medium confidence
- 1.0: Very high confidence
Recommended Thresholds
```typescript
const thresholds = {
sexual: 0.5,
hate: 0.4,
harassment: 0.5,
'self-harm': 0.3,
'sexual/minors': 0.1, // Lower threshold for child safety
'hate/threatening': 0.3,
'violence/graphic': 0.5,
'self-harm/intent': 0.2,
'self-harm/instructions': 0.2,
'harassment/threatening': 0.3,
violence: 0.5,
};
function isFlagged(result: ModerationResult): boolean {
return Object.entries(result.category_scores).some(
([category, score]) => score > thresholds[category]
);
}
```
Batch Moderation
Moderate multiple inputs in a single request:
```typescript
const moderation = await openai.moderations.create({
model: 'omni-moderation-latest',
input: [
'First text to moderate',
'Second text to moderate',
'Third text to moderate',
],
});
moderation.results.forEach((result, index) => {
console.log(Input ${index}: ${result.flagged ? 'FLAGGED' : 'OK'});
if (result.flagged) {
console.log('Categories:', Object.keys(result.categories).filter(
cat => result.categories[cat]
));
}
});
```
Filtering by Category
```typescript
async function moderateContent(text: string) {
const moderation = await openai.moderations.create({
model: 'omni-moderation-latest',
input: text,
});
const result = moderation.results[0];
// Check specific categories
if (result.categories['sexual/minors']) {
throw new Error('Content violates child safety policy');
}
if (result.categories.violence && result.category_scores.violence > 0.7) {
throw new Error('Content contains high-confidence violence');
}
if (result.categories['self-harm/intent']) {
// Flag for human review
await flagForReview(text, 'self-harm-intent');
}
return result.flagged;
}
```
Production Pattern
```typescript
async function moderateUserContent(userInput: string) {
try {
const moderation = await openai.moderations.create({
model: 'omni-moderation-latest',
input: userInput,
});
const result = moderation.results[0];
// Immediate block for severe categories
const severeCategories = [
'sexual/minors',
'self-harm/intent',
'hate/threatening',
'harassment/threatening',
];
for (const category of severeCategories) {
if (result.categories[category]) {
return {
allowed: false,
reason: Content flagged for: ${category},
severity: 'high',
};
}
}
// Custom threshold check
if (result.category_scores.violence > 0.8) {
return {
allowed: false,
reason: 'High-confidence violence detected',
severity: 'medium',
};
}
// Allow content
return {
allowed: true,
scores: result.category_scores,
};
} catch (error) {
console.error('Moderation error:', error);
// Fail closed: block on error
return {
allowed: false,
reason: 'Moderation service unavailable',
severity: 'error',
};
}
}
```
Moderation Best Practices
β Safety:
- Always moderate user-generated content before storing/displaying
- Use lower thresholds for child safety (
sexual/minors) - Block immediately on severe categories
- Log all flagged content for review
β User Experience:
- Provide clear feedback when content is flagged
- Allow users to edit and resubmit
- Explain which policy was violated (without revealing detection details)
- Implement appeals process for false positives
β Performance:
- Batch moderate multiple inputs (up to array limit)
- Cache moderation results for identical content
- Moderate before expensive operations (AI generation, storage)
- Use async moderation for non-critical flows
β Compliance:
- Keep audit logs of all moderation decisions
- Implement human review for borderline cases
- Update thresholds based on your community standards
- Comply with local content regulations
β Don't:
- Skip moderation on "trusted" users (all UGC should be checked)
- Rely solely on
flaggedboolean (check specific categories) - Ignore category scores (they provide nuance)
- Use moderation as sole content policy enforcement (combine with human review)
---
Error Handling
Common HTTP Status Codes
- 200: Success
- 400: Bad Request (invalid parameters)
- 401: Unauthorized (invalid API key)
- 429: Rate Limit Exceeded
- 500: Server Error
- 503: Service Unavailable
Rate Limit Error (429)
```typescript
try {
const completion = await openai.chat.completions.create({ / ... / });
} catch (error) {
if (error.status === 429) {
// Rate limit exceeded - implement exponential backoff
console.error('Rate limit exceeded. Retry after delay.');
}
}
```
Invalid API Key (401)
```typescript
try {
const completion = await openai.chat.completions.create({ / ... / });
} catch (error) {
if (error.status === 401) {
console.error('Invalid API key. Check OPENAI_API_KEY environment variable.');
}
}
```
Exponential Backoff Pattern
```typescript
async function completionWithRetry(params, maxRetries = 3) {
for (let i = 0; i < maxRetries; i++) {
try {
return await openai.chat.completions.create(params);
} catch (error) {
if (error.status === 429 && i < maxRetries - 1) {
const delay = Math.pow(2, i) * 1000; // 1s, 2s, 4s
await new Promise(resolve => setTimeout(resolve, delay));
continue;
}
throw error;
}
}
}
```
---
Rate Limits
Understanding Rate Limits
OpenAI enforces rate limits based on:
- RPM: Requests Per Minute
- TPM: Tokens Per Minute
- IPM: Images Per Minute (for DALL-E)
Limits vary by:
- Usage tier (Free, Tier 1-5)
- Model (GPT-5 has different limits than GPT-4)
- Organization settings
Checking Rate Limit Headers
```typescript
const response = await fetch('https://api.openai.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': Bearer ${apiKey},
'Content-Type': 'application/json',
},
body: JSON.stringify({ / ... / }),
});
console.log(response.headers.get('x-ratelimit-limit-requests'));
console.log(response.headers.get('x-ratelimit-remaining-requests'));
console.log(response.headers.get('x-ratelimit-reset-requests'));
```
Best Practices
β Implement exponential backoff for 429 errors
β Monitor rate limit headers to avoid hitting limits
β Batch requests when possible (e.g., embeddings)
β Use appropriate models (don't use GPT-5 for simple tasks)
β Cache responses when appropriate
---
Production Best Practices
Security
β Never expose API keys in client-side code
```typescript
// β Bad - API key in browser
const apiKey = 'sk-...'; // Visible to users!
// β Good - Server-side proxy
// Client calls your backend, which calls OpenAI
```
β Use environment variables
```bash
export OPENAI_API_KEY="sk-..."
```
β Implement server-side proxy for browser apps
```typescript
// Your backend endpoint
app.post('/api/chat', async (req, res) => {
const completion = await openai.chat.completions.create({
model: 'gpt-5',
messages: req.body.messages,
});
res.json(completion);
});
```
Performance
β Use streaming for long-form content (>100 tokens)
β Set appropriate max_tokens to control costs and latency
β Cache responses when queries are repeated
β Choose appropriate models:
- GPT-5-nano for simple tasks
- GPT-5 for complex reasoning
- GPT-4o for vision tasks
Cost Optimization
β Select right model:
- gpt-5-nano: Cheapest, fastest
- gpt-5-mini: Balance of cost/quality
- gpt-5: Best quality, most expensive
β Limit max_tokens:
```typescript
{
max_tokens: 500, // Don't generate more than needed
}
```
β Use caching:
```typescript
const cache = new Map();
async function getCachedCompletion(prompt) {
if (cache.has(prompt)) {
return cache.get(prompt);
}
const completion = await openai.chat.completions.create({
model: 'gpt-5',
messages: [{ role: 'user', content: prompt }],
});
cache.set(prompt, completion);
return completion;
}
```
Error Handling
β Wrap all API calls in try-catch
β Provide user-friendly error messages
β Log errors for debugging
β Implement retries for transient failures
```typescript
try {
const completion = await openai.chat.completions.create({ / ... / });
} catch (error) {
console.error('OpenAI API error:', error);
// User-friendly message
return {
error: 'Sorry, I encountered an issue. Please try again.',
};
}
```
---
Relationship to openai-responses
openai-api (This Skill)
Traditional/stateless API for:
- β Simple chat completions
- β Embeddings for RAG/search
- β Images (DALL-E 3)
- β Audio (Whisper/TTS)
- β Content moderation
- β One-off text generation
- β Cloudflare Workers / edge deployment
Characteristics:
- Stateless (you manage conversation history)
- No built-in tools
- Maximum flexibility
- Works everywhere (Node.js, browsers, Workers, etc.)
openai-responses Skill
Stateful/agentic API for:
- β Automatic conversation state management
- β Preserved reasoning (Chain of Thought) across turns
- β Built-in tools (Code Interpreter, File Search, Web Search, Image Generation)
- β MCP server integration
- β Background mode for long tasks
- β Polymorphic outputs
Characteristics:
- Stateful (OpenAI manages conversation)
- Built-in tools included
- Better for agentic workflows
- Higher-level abstraction
When to Use Which?
| Use Case | Use openai-api | Use openai-responses |
|----------|----------------|---------------------|
| Simple chat | β | β |
| RAG/embeddings | β | β |
| Image generation | β | β |
| Audio processing | β | β |
| Agentic workflows | β | β |
| Multi-turn reasoning | β | β |
| Background tasks | β | β |
| Custom tools only | β | β |
| Built-in + custom tools | β | β |
Use both: Many apps use openai-api for embeddings/images/audio and openai-responses fo
More from this repository10
nextjs-shadcn-builder skill from ovachiever/droid-tings
security-auditor skill from ovachiever/droid-tings
threejs-graphics-optimizer skill from ovachiever/droid-tings
api-documenter skill from ovachiever/droid-tings
secret-scanner skill from ovachiever/droid-tings
readme-updater skill from ovachiever/droid-tings
applying-brand-guidelines skill from ovachiever/droid-tings
Configures Tailwind v4 with shadcn/ui, automating CSS variable setup, dark mode, and preventing common initialization errors.
deep-reading-analyst skill from ovachiever/droid-tings
dependency-auditor skill from ovachiever/droid-tings