google-gemini-api
π―Skillfrom ovachiever/droid-tings
Seamlessly integrates Google Gemini API with advanced multimodal AI capabilities, supporting text generation, function calling, and thinking mode across various models.
Part of
ovachiever/droid-tings(370 items)
Installation
git clone https://github.com/ovachiever/droid-tings.gitSkill Details
|
Overview
# Google Gemini API - Complete Guide
Version: Phase 2 Complete + Gemini 3 β
Package: @google/genai@1.27.0 (β οΈ NOT @google/generative-ai)
Last Updated: 2025-11-19 (Gemini 3 preview release)
---
β οΈ CRITICAL SDK MIGRATION WARNING
DEPRECATED SDK: @google/generative-ai (sunset November 30, 2025)
CURRENT SDK: @google/genai v1.27+
If you see code using @google/generative-ai, it's outdated!
This skill uses the correct current SDK and provides a complete migration guide.
---
Status
β Phase 1 Complete:
- β Text Generation (basic + streaming)
- β Multimodal Inputs (images, video, audio, PDFs)
- β Function Calling (basic + parallel execution)
- β System Instructions & Multi-turn Chat
- β Thinking Mode Configuration
- β Generation Parameters (temperature, top-p, top-k, stop sequences)
- β Both Node.js SDK (@google/genai) and fetch approaches
β Phase 2 Complete:
- β Context Caching (cost optimization with TTL-based caching)
- β Code Execution (built-in Python interpreter and sandbox)
- β Grounding with Google Search (real-time web information + citations)
π¦ Separate Skills:
- Embeddings: See
google-gemini-embeddingsskill for text-embedding-004
---
Table of Contents
Phase 1 - Core Features:
- [Quick Start](#quick-start)
- [Current Models (2025)](#current-models-2025)
- [SDK vs Fetch Approaches](#sdk-vs-fetch-approaches)
- [Text Generation](#text-generation)
- [Streaming](#streaming)
- [Multimodal Inputs](#multimodal-inputs)
- [Function Calling](#function-calling)
- [System Instructions](#system-instructions)
- [Multi-turn Chat](#multi-turn-chat)
- [Thinking Mode](#thinking-mode)
- [Generation Configuration](#generation-configuration)
Phase 2 - Advanced Features:
- [Context Caching](#context-caching)
- [Code Execution](#code-execution)
- [Grounding with Google Search](#grounding-with-google-search)
Common Reference:
- [Error Handling](#error-handling)
- [Rate Limits](#rate-limits)
- [SDK Migration Guide](#sdk-migration-guide)
- [Production Best Practices](#production-best-practices)
---
Quick Start
Installation
CORRECT SDK:
```bash
npm install @google/genai@1.27.0
```
β WRONG (DEPRECATED):
```bash
npm install @google/generative-ai # DO NOT USE!
```
Environment Setup
```bash
export GEMINI_API_KEY="..."
```
Or create .env file:
```
GEMINI_API_KEY=...
```
First Text Generation (Node.js SDK)
```typescript
import { GoogleGenAI } from '@google/genai';
const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
const response = await ai.models.generateContent({
model: 'gemini-2.5-flash',
contents: 'Explain quantum computing in simple terms'
});
console.log(response.text);
```
First Text Generation (Fetch - Cloudflare Workers)
```typescript
const response = await fetch(
https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent,
{
method: 'POST',
headers: {
'Content-Type': 'application/json',
'x-goog-api-key': env.GEMINI_API_KEY,
},
body: JSON.stringify({
contents: [{ parts: [{ text: 'Explain quantum computing in simple terms' }] }]
}),
}
);
const data = await response.json();
console.log(data.candidates[0].content.parts[0].text);
```
---
Current Models (2025)
Gemini 3 Series (Preview - November 2025)
#### gemini-3-pro-preview
- Context: TBD (documentation pending)
- Status: π Preview release (November 18, 2025)
- Description: Google's newest and most intelligent AI model with state-of-the-art reasoning
- Best for: Most complex reasoning tasks, advanced multimodal understanding, benchmark-critical applications
- Features: Enhanced multimodal (text, image, video, audio, PDF), function calling, streaming
- Benchmark Performance: Outperforms Gemini 2.5 Pro on every major AI benchmark
- β οΈ Preview: Use for evaluation. Consider gemini-2.5-pro for production until stable release
Gemini 2.5 Series (General Availability - Stable)
#### gemini-2.5-pro
- Context: 1,048,576 input tokens / 65,536 output tokens
- Description: State-of-the-art thinking model for complex reasoning
- Best for: Code, math, STEM, complex problem-solving
- Features: Thinking mode (default on), function calling, multimodal, streaming
- Knowledge cutoff: January 2025
#### gemini-2.5-flash
- Context: 1,048,576 input tokens / 65,536 output tokens
- Description: Best price-performance workhorse model
- Best for: Large-scale processing, low-latency, high-volume, agentic use cases
- Features: Thinking mode (default on), function calling, multimodal, streaming
- Knowledge cutoff: January 2025
#### gemini-2.5-flash-lite
- Context: 1,048,576 input tokens / 65,536 output tokens
- Description: Cost-optimized, fastest 2.5 model
- Best for: High throughput, cost-sensitive applications
- Features: Thinking mode (default on), function calling, multimodal, streaming
- Knowledge cutoff: January 2025
Model Feature Matrix
| Feature | 3-Pro (Preview) | 2.5-Pro | 2.5-Flash | 2.5-Flash-Lite |
|---------|-----------------|---------|-----------|----------------|
| Thinking Mode | TBD | β Default ON | β Default ON | β Default ON |
| Function Calling | β | β | β | β |
| Multimodal | β Enhanced | β | β | β |
| Streaming | β | β | β | β |
| System Instructions | β | β | β | β |
| Context Window | TBD | 1,048,576 in | 1,048,576 in | 1,048,576 in |
| Output Tokens | TBD | 65,536 max | 65,536 max | 65,536 max |
| Status | Preview | Stable | Stable | Stable |
β οΈ Context Window Correction
ACCURATE (Gemini 2.5): Gemini 2.5 models support 1,048,576 input tokens (NOT 2M!)
OUTDATED: Only Gemini 1.5 Pro (previous generation) had 2M token context window
GEMINI 3: Context window specifications pending official documentation
Common mistake: Claiming Gemini 2.5 has 2M tokens. It doesn't. This skill prevents this error.
---
SDK vs Fetch Approaches
Node.js SDK (@google/genai)
Pros:
- Type-safe with TypeScript
- Easier API (simpler syntax)
- Built-in chat helpers
- Automatic SSE parsing for streaming
- Better error handling
Cons:
- Requires Node.js or compatible runtime
- Larger bundle size
- May not work in all edge runtimes
Use when: Building Node.js apps, Next.js Server Actions/Components, or any environment with Node.js compatibility
Fetch-based (Direct REST API)
Pros:
- Works in any JavaScript environment (Cloudflare Workers, Deno, Bun, browsers)
- Minimal dependencies
- Smaller bundle size
- Full control over requests
Cons:
- More verbose syntax
- Manual SSE parsing for streaming
- No built-in chat helpers
- Manual error handling
Use when: Deploying to Cloudflare Workers, browser clients, or lightweight edge runtimes
---
Text Generation
Basic Text Generation (SDK)
```typescript
import { GoogleGenAI } from '@google/genai';
const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
const response = await ai.models.generateContent({
model: 'gemini-2.5-flash',
contents: 'Write a haiku about artificial intelligence'
});
console.log(response.text);
```
Basic Text Generation (Fetch)
```typescript
const response = await fetch(
https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent,
{
method: 'POST',
headers: {
'Content-Type': 'application/json',
'x-goog-api-key': env.GEMINI_API_KEY,
},
body: JSON.stringify({
contents: [
{
parts: [
{ text: 'Write a haiku about artificial intelligence' }
]
}
]
}),
}
);
const data = await response.json();
console.log(data.candidates[0].content.parts[0].text);
```
Response Structure
```typescript
{
text: string, // Convenience accessor for text content
candidates: [
{
content: {
parts: [
{ text: string } // Generated text
],
role: string // "model"
},
finishReason: string, // "STOP" | "MAX_TOKENS" | "SAFETY" | "OTHER"
index: number
}
],
usageMetadata: {
promptTokenCount: number,
candidatesTokenCount: number,
totalTokenCount: number
}
}
```
---
Streaming
Streaming with SDK (Async Iteration)
```typescript
const response = await ai.models.generateContentStream({
model: 'gemini-2.5-flash',
contents: 'Write a 200-word story about time travel'
});
for await (const chunk of response) {
process.stdout.write(chunk.text);
}
```
Streaming with Fetch (SSE Parsing)
```typescript
const response = await fetch(
https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:streamGenerateContent,
{
method: 'POST',
headers: {
'Content-Type': 'application/json',
'x-goog-api-key': env.GEMINI_API_KEY,
},
body: JSON.stringify({
contents: [{ parts: [{ text: 'Write a 200-word story about time travel' }] }]
}),
}
);
const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = '';
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
const lines = buffer.split('\n');
buffer = lines.pop() || '';
for (const line of lines) {
if (line.trim() === '' || line.startsWith('data: [DONE]')) continue;
if (!line.startsWith('data: ')) continue;
try {
const data = JSON.parse(line.slice(6));
const text = data.candidates[0]?.content?.parts[0]?.text;
if (text) {
process.stdout.write(text);
}
} catch (e) {
// Skip invalid JSON
}
}
}
```
Key Points:
- Use
streamGenerateContentendpoint (notgenerateContent) - Parse Server-Sent Events (SSE) format:
data: {json}\n\n - Handle incomplete chunks in buffer
- Skip empty lines and
[DONE]markers
---
Multimodal Inputs
Gemini 2.5 models support text + images + video + audio + PDFs in the same request.
Images (Vision)
#### SDK Approach
```typescript
import { GoogleGenAI } from '@google/genai';
import fs from 'fs';
const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
// From file
const imageData = fs.readFileSync('/path/to/image.jpg');
const base64Image = imageData.toString('base64');
const response = await ai.models.generateContent({
model: 'gemini-2.5-flash',
contents: [
{
parts: [
{ text: 'What is in this image?' },
{
inlineData: {
data: base64Image,
mimeType: 'image/jpeg'
}
}
]
}
]
});
console.log(response.text);
```
#### Fetch Approach
```typescript
const imageData = fs.readFileSync('/path/to/image.jpg');
const base64Image = imageData.toString('base64');
const response = await fetch(
https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent,
{
method: 'POST',
headers: {
'Content-Type': 'application/json',
'x-goog-api-key': env.GEMINI_API_KEY,
},
body: JSON.stringify({
contents: [
{
parts: [
{ text: 'What is in this image?' },
{
inlineData: {
data: base64Image,
mimeType: 'image/jpeg'
}
}
]
}
]
}),
}
);
const data = await response.json();
console.log(data.candidates[0].content.parts[0].text);
```
Supported Image Formats:
- JPEG (
.jpg,.jpeg) - PNG (
.png) - WebP (
.webp) - HEIC (
.heic) - HEIF (
.heif)
Max Image Size: 20MB per image
Video
```typescript
// Video must be < 2 minutes for inline data
const videoData = fs.readFileSync('/path/to/video.mp4');
const base64Video = videoData.toString('base64');
const response = await ai.models.generateContent({
model: 'gemini-2.5-flash',
contents: [
{
parts: [
{ text: 'Describe what happens in this video' },
{
inlineData: {
data: base64Video,
mimeType: 'video/mp4'
}
}
]
}
]
});
console.log(response.text);
```
Supported Video Formats:
- MP4 (
.mp4) - MPEG (
.mpeg) - MOV (
.mov) - AVI (
.avi) - FLV (
.flv) - MPG (
.mpg) - WebM (
.webm) - WMV (
.wmv)
Max Video Length (inline): 2 minutes
Max Video Size: 2GB (use File API for larger files - Phase 2)
Audio
```typescript
const audioData = fs.readFileSync('/path/to/audio.mp3');
const base64Audio = audioData.toString('base64');
const response = await ai.models.generateContent({
model: 'gemini-2.5-flash',
contents: [
{
parts: [
{ text: 'Transcribe and summarize this audio' },
{
inlineData: {
data: base64Audio,
mimeType: 'audio/mp3'
}
}
]
}
]
});
console.log(response.text);
```
Supported Audio Formats:
- MP3 (
.mp3) - WAV (
.wav) - FLAC (
.flac) - AAC (
.aac) - OGG (
.ogg) - OPUS (
.opus)
Max Audio Size: 20MB
PDFs
```typescript
const pdfData = fs.readFileSync('/path/to/document.pdf');
const base64Pdf = pdfData.toString('base64');
const response = await ai.models.generateContent({
model: 'gemini-2.5-flash',
contents: [
{
parts: [
{ text: 'Summarize the key points in this PDF' },
{
inlineData: {
data: base64Pdf,
mimeType: 'application/pdf'
}
}
]
}
]
});
console.log(response.text);
```
Max PDF Size: 30MB
PDF Limitations: Text-based PDFs work best; scanned images may have lower accuracy
Multiple Inputs
You can combine multiple modalities in one request:
```typescript
const response = await ai.models.generateContent({
model: 'gemini-2.5-flash',
contents: [
{
parts: [
{ text: 'Compare these two images and describe the differences:' },
{ inlineData: { data: base64Image1, mimeType: 'image/jpeg' } },
{ inlineData: { data: base64Image2, mimeType: 'image/jpeg' } }
]
}
]
});
```
---
Function Calling
Gemini supports function calling (tool use) to connect models with external APIs and systems.
Basic Function Calling (SDK)
```typescript
import { GoogleGenAI, FunctionCallingConfigMode } from '@google/genai';
const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
// Define function declarations
const getCurrentWeather = {
name: 'get_current_weather',
description: 'Get the current weather for a location',
parametersJsonSchema: {
type: 'object',
properties: {
location: {
type: 'string',
description: 'City name, e.g. San Francisco'
},
unit: {
type: 'string',
enum: ['celsius', 'fahrenheit']
}
},
required: ['location']
}
};
// Make request with tools
const response = await ai.models.generateContent({
model: 'gemini-2.5-flash',
contents: 'What\'s the weather in Tokyo?',
config: {
tools: [
{ functionDeclarations: [getCurrentWeather] }
]
}
});
// Check if model wants to call a function
const functionCall = response.candidates[0].content.parts[0].functionCall;
if (functionCall) {
console.log('Function to call:', functionCall.name);
console.log('Arguments:', functionCall.args);
// Execute the function (your implementation)
const weatherData = await fetchWeather(functionCall.args.location);
// Send function result back to model
const finalResponse = await ai.models.generateContent({
model: 'gemini-2.5-flash',
contents: [
'What\'s the weather in Tokyo?',
response.candidates[0].content, // Original assistant response with function call
{
parts: [
{
functionResponse: {
name: functionCall.name,
response: weatherData
}
}
]
}
],
config: {
tools: [
{ functionDeclarations: [getCurrentWeather] }
]
}
});
console.log(finalResponse.text);
}
```
Function Calling (Fetch)
```typescript
const response = await fetch(
https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent,
{
method: 'POST',
headers: {
'Content-Type': 'application/json',
'x-goog-api-key': env.GEMINI_API_KEY,
},
body: JSON.stringify({
contents: [
{ parts: [{ text: 'What\'s the weather in Tokyo?' }] }
],
tools: [
{
functionDeclarations: [
{
name: 'get_current_weather',
description: 'Get the current weather for a location',
parameters: {
type: 'object',
properties: {
location: {
type: 'string',
description: 'City name'
}
},
required: ['location']
}
}
]
}
]
}),
}
);
const data = await response.json();
const functionCall = data.candidates[0]?.content?.parts[0]?.functionCall;
if (functionCall) {
// Execute function and send result back (same flow as SDK)
}
```
Parallel Function Calling
Gemini can call multiple independent functions simultaneously:
```typescript
const tools = [
{
functionDeclarations: [
{
name: 'get_weather',
description: 'Get weather for a location',
parametersJsonSchema: {
type: 'object',
properties: {
location: { type: 'string' }
},
required: ['location']
}
},
{
name: 'get_population',
description: 'Get population of a city',
parametersJsonSchema: {
type: 'object',
properties: {
city: { type: 'string' }
},
required: ['city']
}
}
]
}
];
const response = await ai.models.generateContent({
model: 'gemini-2.5-flash',
contents: 'What is the weather and population of Tokyo?',
config: { tools }
});
// Model may return MULTIPLE function calls in parallel
const functionCalls = response.candidates[0].content.parts.filter(
part => part.functionCall
);
console.log(Model wants to call ${functionCalls.length} functions in parallel);
```
Function Calling Modes
```typescript
import { FunctionCallingConfigMode } from '@google/genai';
const response = await ai.models.generateContent({
model: 'gemini-2.5-flash',
contents: 'What\'s the weather?',
config: {
tools: [{ functionDeclarations: [getCurrentWeather] }],
toolConfig: {
functionCallingConfig: {
mode: FunctionCallingConfigMode.ANY, // Force function call
// mode: FunctionCallingConfigMode.AUTO, // Model decides (default)
// mode: FunctionCallingConfigMode.NONE, // Never call functions
allowedFunctionNames: ['get_current_weather'] // Optional: restrict to specific functions
}
}
}
});
```
Modes:
AUTO(default): Model decides whether to call functionsANY: Force model to call at least one functionNONE: Disable function calling for this request
---
System Instructions
System instructions guide the model's behavior and set context. They are separate from the conversation messages.
SDK Approach
```typescript
const response = await ai.models.generateContent({
model: 'gemini-2.5-flash',
systemInstruction: 'You are a helpful AI assistant that always responds in the style of a pirate. Use nautical terminology and end sentences with "arrr".',
contents: 'Explain what a database is'
});
console.log(response.text);
// Output: "Ahoy there! A database be like a treasure chest..."
```
Fetch Approach
```typescript
const response = await fetch(
https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent,
{
method: 'POST',
headers: {
'Content-Type': 'application/json',
'x-goog-api-key': env.GEMINI_API_KEY,
},
body: JSON.stringify({
systemInstruction: {
parts: [
{ text: 'You are a helpful AI assistant that always responds in the style of a pirate.' }
]
},
contents: [
{ parts: [{ text: 'Explain what a database is' }] }
]
}),
}
);
```
Key Points:
- System instructions are NOT part of
contentsarray - They are set once at the top level of the request
- They persist for the entire conversation (when using multi-turn chat)
- They don't count as user or model messages
---
Multi-turn Chat
For conversations with history, use the SDK's chat helpers or manually manage conversation state.
SDK Chat Helpers (Recommended)
```typescript
const chat = await ai.models.createChat({
model: 'gemini-2.5-flash',
systemInstruction: 'You are a helpful coding assistant.',
history: [] // Start empty or with previous messages
});
// Send first message
const response1 = await chat.sendMessage('What is TypeScript?');
console.log('Assistant:', response1.text);
// Send follow-up (context is automatically maintained)
const response2 = await chat.sendMessage('How do I install it?');
console.log('Assistant:', response2.text);
// Get full chat history
const history = chat.getHistory();
console.log('Full conversation:', history);
```
Manual Chat Management (Fetch)
```typescript
const conversationHistory = [];
// First turn
const response1 = await fetch(
https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent,
{
method: 'POST',
headers: {
'Content-Type': 'application/json',
'x-goog-api-key': env.GEMINI_API_KEY,
},
body: JSON.stringify({
contents: [
{
role: 'user',
parts: [{ text: 'What is TypeScript?' }]
}
]
}),
}
);
const data1 = await response1.json();
const assistantReply1 = data1.candidates[0].content.parts[0].text;
// Add to history
conversationHistory.push(
{ role: 'user', parts: [{ text: 'What is TypeScript?' }] },
{ role: 'model', parts: [{ text: assistantReply1 }] }
);
// Second turn (include full history)
const response2 = await fetch(
https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent,
{
method: 'POST',
headers: {
'Content-Type': 'application/json',
'x-goog-api-key': env.GEMINI_API_KEY,
},
body: JSON.stringify({
contents: [
...conversationHistory,
{ role: 'user', parts: [{ text: 'How do I install it?' }] }
]
}),
}
);
```
Message Roles:
user: User messagesmodel: Assistant responses
β οΈ Important: Chat helpers are SDK-only. With fetch, you must manually manage conversation history.
---
Thinking Mode
Gemini 2.5 models have thinking mode enabled by default for enhanced quality. You can configure the thinking budget.
Configure Thinking Budget (SDK)
```typescript
const response = await ai.models.generateContent({
model: 'gemini-2.5-flash',
contents: 'Solve this complex math problem: ...',
config: {
thinkingConfig: {
thinkingBudget: 8192 // Max tokens for thinking (default: model-dependent)
}
}
});
```
Configure Thinking Budget (Fetch)
```typescript
const response = await fetch(
https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent,
{
method: 'POST',
headers: {
'Content-Type': 'application/json',
'x-goog-api-key': env.GEMINI_API_KEY,
},
body: JSON.stringify({
contents: [{ parts: [{ text: 'Solve this complex math problem: ...' }] }],
generationConfig: {
thinkingConfig: {
thinkingBudget: 8192
}
}
}),
}
);
```
Key Points:
- Thinking mode is always enabled on Gemini 2.5 models (cannot be disabled)
- Higher thinking budgets allow more internal reasoning (may increase latency)
- Default budget varies by model (usually sufficient for most tasks)
- Only increase budget for very complex reasoning tasks
---
Generation Configuration
Customize model behavior with generation parameters.
All Configuration Options (SDK)
```typescript
const response = await ai.models.generateContent({
model: 'gemini-2.5-flash',
contents: 'Write a creative story',
config: {
temperature: 0.9, // Randomness (0.0-2.0, default: 1.0)
topP: 0.95, // Nucleus sampling (0.0-1.0)
topK: 40, // Top-k sampling
maxOutputTokens: 2048, // Max tokens to generate
stopSequences: ['END'], // Stop generation if these appear
responseMimeType: 'text/plain', // Or 'application/json' for JSON mode
candidateCount: 1 // Number of response candidates (usually 1)
}
});
```
All Configuration Options (Fetch)
```typescript
const response = await fetch(
https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent,
{
method: 'POST',
headers: {
'Content-Type': 'application/json',
'x-goog-api-key': env.GEMINI_API_KEY,
},
body: JSON.stringify({
contents: [{ parts: [{ text: 'Write a creative story' }] }],
generationConfig: {
temperature: 0.9,
topP: 0.95,
topK: 40,
maxOutputTokens: 2048,
stopSequences: ['END'],
responseMimeType: 'text/plain',
candidateCount: 1
}
}),
}
);
```
Parameter Guidelines
| Parameter | Range | Default | Use Case |
|-----------|-------|---------|----------|
| temperature | 0.0-2.0 | 1.0 | Lower = more focused, higher = more creative |
| topP | 0.0-1.0 | 0.95 | Nucleus sampling threshold |
| topK | 1-100+ | 40 | Limit to top K tokens |
| maxOutputTokens | 1-65536 | Model max | Control response length |
| stopSequences | Array | None | Stop generation at specific strings |
Tips:
- For factual tasks: Use low temperature (0.0-0.3)
- For creative tasks: Use high temperature (0.7-1.5)
- topP and topK both control randomness; use one or the other (not both)
- Always set maxOutputTokens to prevent excessive generation
---
Context Caching
Context caching allows you to cache frequently used content (like system instructions, large documents, or video files) to reduce costs by up to 90% and improve latency.
How It Works
- Create a cache with your repeated content
- Reference the cache in subsequent requests
- Save tokens - cached tokens cost significantly less
- TTL management - caches expire after specified time
Benefits
- Cost savings: Up to 90% reduction on cached tokens
- Reduced latency: Faster responses by reusing processed content
- Consistent context: Same large context across multiple requests
Cache Creation (SDK)
```typescript
import { GoogleGenAI } from '@google/genai';
import fs from 'fs';
const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
// Create a cache for a large document
const documentText = fs.readFileSync('./large-document.txt', 'utf-8');
const cache = await ai.caches.create({
model: 'gemini-2.5-flash',
config: {
displayName: 'large-doc-cache', // Identifier for the cache
systemInstruction: 'You are an expert at analyzing legal documents.',
contents: documentText,
ttl: '3600s', // Cache for 1 hour
}
});
console.log('Cache created:', cache.name);
console.log('Expires at:', cache.expireTime);
```
Cache Creation (Fetch)
```typescript
const response = await fetch(
'https://generativelanguage.googleapis.com/v1beta/cachedContents',
{
method: 'POST',
headers: {
'Content-Type': 'application/json',
'x-goog-api-key': env.GEMINI_API_KEY,
},
body: JSON.stringify({
model: 'models/gemini-2.5-flash',
displayName: 'large-doc-cache',
systemInstruction: {
parts: [{ text: 'You are an expert at analyzing legal documents.' }]
},
contents: [
{ parts: [{ text: documentText }] }
],
ttl: '3600s'
}),
}
);
const cache = await response.json();
console.log('Cache created:', cache.name);
```
Using a Cache (SDK)
```typescript
// Generate content using the cache
const response = await ai.models.generateContent({
model: cache.name, // Use cache name as model
contents: 'Summarize the key points in the document'
});
console.log(response.text);
```
Using a Cache (Fetch)
```typescript
const response = await fetch(
https://generativelanguage.googleapis.com/v1beta/${cache.name}:generateContent,
{
method: 'POST',
headers: {
'Content-Type': 'application/json',
'x-goog-api-key': env.GEMINI_API_KEY,
},
body: JSON.stringify({
contents: [
{ parts: [{ text: 'Summarize the key points in the document' }] }
]
}),
}
);
const data = await response.json();
console.log(data.candidates[0].content.parts[0].text);
```
Update Cache TTL (SDK)
```typescript
import { UpdateCachedContentConfig } from '@google/genai';
await ai.caches.update({
name: cache.name,
config: {
ttl: '7200s' // Extend to 2 hours
}
});
```
Update Cache with Expiration Time (SDK)
```typescript
// Set specific expiration time (must be timezone-aware)
const in10Minutes = new Date(Date.now() + 10 60 1000);
await ai.caches.update({
name: cache.name,
config: {
expireTime: in10Minutes
}
});
```
List and Delete Caches (SDK)
```typescript
// List all caches
const caches = await ai.caches.list();
for (const cache of caches) {
console.log(cache.name, cache.displayName);
}
// Delete a specific cache
await ai.caches.delete({ name: cache.name });
```
Caching with Video Files
```typescript
import { GoogleGenAI } from '@google/genai';
import fs from 'fs';
const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
// Upload video file
const videoFile = await ai.files.upload({
file: fs.createReadStream('./video.mp4')
});
// Wait for processing
while (videoFile.state.name === 'PROCESSING') {
await new Promise(resolve => setTimeout(resolve, 2000));
videoFile = await ai.files.get({ name: videoFile.name });
}
// Create cache with video
const cache = await ai.caches.create({
model: 'gemini-2.5-flash',
config: {
displayName: 'video-analysis-cache',
systemInstruction: 'You are an expert video analyzer.',
contents: [videoFile],
ttl: '300s' // 5 minutes
}
});
// Use cache for multiple queries
const response1 = await ai.models.generateContent({
model: cache.name,
contents: 'What happens in the first minute?'
});
const response2 = await ai.models.generateContent({
model: cache.name,
contents: 'Describe the main characters'
});
```
Key Points
When to Use Caching:
- Large system instructions used repeatedly
- Long documents analyzed multiple times
- Video/audio files queried with different prompts
- Consistent context across conversation sessions
TTL Guidelines:
- Short sessions: 300s (5 min) to 3600s (1 hour)
- Long sessions: 3600s (1 hour) to 86400s (24 hours)
- Maximum: 7 days
Cost Savings:
- Cached input tokens: ~90% cheaper than regular tokens
- Output tokens: Same price (not cached)
Important:
- You must use explicit model version suffixes (e.g.,
gemini-2.5-flash-001, NOT justgemini-2.5-flash) - Caches are automatically deleted after TTL expires
- Update TTL before expiration to extend cache lifetime
---
Code Execution
Gemini models can generate and execute Python code to solve problems requiring computation, data analysis, or visualization.
How It Works
- Model generates executable Python code
- Code runs in secure sandbox
- Results are returned to the model
- Model incorporates results into response
Supported Operations
- Mathematical calculations
- Data analysis and statistics
- File processing (CSV, JSON, etc.)
- Chart and graph generation
- Algorithm implementation
- Data transformations
Available Python Packages
Standard Library:
math,statistics,random,datetime,json,csv,recollections,itertools,functools
Data Science:
numpy,pandas,scipy
Visualization:
matplotlib,seaborn
Note: Limited package availability compared to full Python environment
Basic Code Execution (SDK)
```typescript
import { GoogleGenAI, Tool, ToolCodeExecution } from '@google/genai';
const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
const response = await ai.models.generateContent({
model: 'gemini-2.5-flash',
contents: 'What is the sum of the first 50 prime numbers? Generate and run code for the calculation.',
config: {
tools: [{ codeExecution: {} }]
}
});
// Parse response parts
for (const part of response.candidates[0].content.parts) {
if (part.text) {
console.log('Text:', part.text);
}
if (part.executableCode) {
console.log('Generated Code:', part.executableCode.code);
}
if (part.codeExecutionResult) {
console.log('Execution Output:', part.codeExecutionResult.output);
}
}
```
Basic Code Execution (Fetch)
```typescript
const response = await fetch(
https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent,
{
method: 'POST',
headers: {
'Content-Type': 'application/json',
'x-goog-api-key': env.GEMINI_API_KEY,
},
body: JSON.stringify({
tools: [{ code_execution: {} }],
contents: [
{
parts: [
{ text: 'What is the sum of the first 50 prime numbers? Generate and run code.' }
]
}
]
}),
}
);
const data = await response.json();
for (const part of data.candidates[0].content.parts) {
if (part.text) {
console.log('Text:', part.text);
}
if (part.executableCode) {
console.log('Code:', part.executableCode.code);
}
if (part.codeExecutionResult) {
console.log('Result:', part.codeExecutionResult.output);
}
}
```
Chat with Code Execution (SDK)
```typescript
const chat = await ai.chats.create({
model: 'gemini-2.5-flash',
config: {
tools: [{ codeExecution: {} }]
}
});
let response = await chat.sendMessage('I have a math question for you.');
console.log(response.text);
response = await chat.sendMessage(
'Calculate the Fibonacci sequence up to the 20th number and sum them.'
);
// Model will generate and execute code, then provide answer
for (const part of response.candidates[0].content.parts) {
if (part.text) console.log(part.text);
if (part.executableCode) console.log('Code:', part.executableCode.code);
if (part.codeExecutionResult) console.log('Output:', part.codeExecutionResult.output);
}
```
Data Analysis Example
```typescript
const response = await ai.models.generateContent({
model: 'gemini-2.5-flash',
contents: `
Analyze this sales data and calculate:
1. Total revenue
2. Average sale price
3. Best-selling month
Data (CSV format):
month,sales,revenue
Jan,150,45000
Feb,200,62000
Mar,175,53000
Apr,220,68000
`,
config: {
tools: [{ codeExecution: {} }]
}
});
// Model will generate pandas/numpy code to analyze data
for (const part of response.candidates[0].content.parts) {
if (part.text) console.log(part.text);
if (part.executableCode) console.log('Analysis Code:', part.executableCode.code);
if (part.codeExecutionResult) console.log('Results:', part.codeExecutionResult.output);
}
```
Visualization Example
```typescript
const response = await ai.models.generateContent({
model: 'gemini-2.5-flash',
contents: 'Create a bar chart showing the distribution of prime numbers under 100 by their last digit. Generate the chart and describe the pattern.',
config: {
tools: [{ codeExecution: {} }]
}
});
// Model generates matplotlib code, executes it, and describes results
for (const part of response.candidates[0].content.parts) {
if (part.text) console.log(part.text);
if (part.executableCode) console.log('Chart Code:', part.executableCode.code);
if (part.codeExecutionResult) {
// Note: Chart image data would be in output
console.log('Execution completed');
}
}
```
Response Structure
```typescript
{
candidates: [
{
content: {
parts: [
{ text: "I'll calculate that for you." },
{
executableCode: {
language: "PYTHON",
code: "def is_prime(n):\n if n <= 1:\n return False\n ..."
}
},
{
codeExecutionResult: {
outcome: "OUTCOME_OK", // or "OUTCOME_FAILED"
output: "5117\n"
}
},
{ text: "The sum of the first 50 prime numbers is 5117." }
]
}
}
]
}
```
Error Handling
```typescript
for (const part of response.candidates[0].content.parts) {
if (part.codeExecutionResult) {
if (part.codeExecutionResult.outcome === 'OUTCOME_FAILED') {
console.error('Code execution failed:', part.codeExecutionResult.output);
} else {
console.log('Success:', part.codeExecutionResult.output);
}
}
}
```
Key Points
When to Use Code Execution:
- Complex mathematical calculations
- Data analysis and statistics
- Algorithm implementations
- File parsing and processing
- Chart generation
- Computational problems
Limitations:
- Sandbox environment (limited file system access)
- Limited Python package availability
- Execution timeout limits
- No network access from code
- No persistent state between executions
Best Practices:
- Specify what calculation or analysis you need clearly
- Request code generation explicitly ("Generate and run code...")
- Check
outcomefield for errors - Use for deterministic computations, not for general programming
Important:
- Available on all Gemini 2.5 models (Pro, Flash, Flash-Lite)
- Code runs in isolated sandbox for security
- Supports Python with standard library and common data science packages
---
Grounding with Google Search
Grounding connects the model to real-time web information, reducing hallucinations and providing up-to-date, fact-checked responses with citations.
How It Works
- Model determines if it needs current information
- Automatically performs Google Search
- Processes search results
- Incorporates findings into response
- Provides citations and source URLs
Benefits
- Real-time information: Access to current events and data
- Reduced hallucinations: Answers grounded in web sources
- Verifiable: Citations allow fact-checking
- Up-to-date: Not limited to model's training cutoff
Two Grounding APIs
#### 1. Google Search (googleSearch) - Recommended for Gemini 2.5
```typescript
const groundingTool = {
googleSearch: {}
};
```
Features:
- Simple configuration
- Automatic search when needed
- Available on all Gemini 2.5 models
#### 2. Google Search Retrieval (googleSearchRetrieval) - Legacy (Gemini 1.5)
```typescript
const retrievalTool = {
googleSearchRetrieval: {
dynamicRetrievalConfig: {
mode: 'MODE_DYNAMIC',
dynamicThreshold: 0.7 // Only search if confidence < 70%
}
}
};
```
Features:
- Dynamic threshold control
- Used with Gemini 1.5 models
- More configuration options
Basic Grounding (SDK) - Gemini 2.5
```typescript
import { GoogleGenAI } from '@google/genai';
const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
const response = await ai.models.generateContent({
model: 'gemini-2.5-flash',
contents: 'Who won the euro 2024?',
config: {
tools: [{ googleSearch: {} }]
}
});
console.log(response.text);
// Check if grounding was used
if (response.candidates[0].groundingMetadata) {
console.log('Search was performed!');
console.log('Sources:', response.candidates[0].groundingMetadata);
}
```
Basic Grounding (Fetch) - Gemini 2.5
```typescript
const response = await fetch(
https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent,
{
method: 'POST',
headers: {
'Content-Type': 'application/json',
'x-goog-api-key': env.GEMINI_API_KEY,
},
body: JSON.stringify({
contents: [
{ parts: [{ text: 'Who won the euro 2024?' }] }
],
tools: [
{ google_search: {} }
]
}),
}
);
const data = await response.json();
console.log(data.candidates[0].content.parts[0].text);
if (data.candidates[0].groundingMetadata) {
console.log('Grounding metadata:', data.candidates[0].groundingMetadata);
}
```
Dynamic Retrieval (SDK) - Gemini 1.5
```typescript
import { GoogleGenAI, DynamicRetrievalConfigMode } from '@google/genai';
const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
const response = await ai.models.generateContent({
model: 'gemini-1.5-flash',
contents: 'Who won the euro 2024?',
config: {
tools: [
{
googleSearchRetrieval: {
dynamicRetrievalConfig: {
mode: DynamicRetrievalConfigMode.MODE_DYNAMIC,
dynamicThreshold: 0.7 // Search only if confidence < 70%
}
}
}
]
}
});
console.log(response.text);
if (!response.candidates[0].groundingMetadata) {
console.log('Model answered from its own knowledge (high confidence)');
}
```
Grounding Metadata Structure
```typescript
{
groundingMetadata: {
searchQueries: [
{ text: "euro 2024 winner" }
],
webPages: [
{
url: "https://example.com/euro-2024-results",
title: "UEFA Euro 2024 Final Results",
snippet: "Spain won UEFA Euro 2024..."
}
],
citations: [
{
startIndex: 42,
endIndex: 47,
uri: "https://example.com/euro-2024-results"
}
],
retrievalQueries: [
{
query: "who won euro 2024 final"
}
]
}
}
```
Chat with Grounding (SDK)
```typescript
const chat = await ai.chats.create({
model: 'gemini-2.5-flash',
config: {
tools: [{ googleSearch: {} }]
}
});
let response = await chat.sendMessage('What are the latest developments in quantum computing?');
console.log(response.text);
// Check grounding sources
if (response.candidates[0].groundingMetadata) {
const sources = response.candidates[0].groundingMetadata.webPages || [];
console.log(Sources used: ${sources.length});
sources.forEach(source => {
console.log(- ${source.title}: ${source.url});
});
}
// Follow-up still has grounding enabled
response = await chat.sendMessage('Which company made the biggest breakthrough?');
console.log(response.text);
```
Combining Grounding with Function Calling
```typescript
const weatherFunction = {
name: 'get_current_weather',
description: 'Get current weather for a location',
parametersJsonSchema: {
type: 'object',
properties: {
location: { type: 'string', description: 'City name' }
},
required: ['location']
}
};
const response = await ai.models.generateContent({
model: 'gemini-2.5-flash',
contents: 'What is the weather like in the city that won Euro 2024?',
config: {
tools: [
{ googleSearch: {} },
{ functionDeclarations: [weatherFunction] }
]
}
});
// Model will:
// 1. Use Google Search to find Euro 2024 winner
// 2. Call get_current_weather function with the city
// 3. Combine both results in response
```
Checking if Grounding was Used
```typescript
const response = await ai.models.generateContent({
model: 'gemini-2.5-flash',
contents: 'What is 2+2?', // Model knows this without search
config: {
tools: [{ googleSearch: {} }]
}
});
if (!response.candidates[0].groundingMetadata) {
console.log('Model answered from its own knowledge (no search needed)');
} else {
console.log('Search was performed');
}
```
Key Points
When to Use Grounding:
- Current events and news
- Real-time data (stock prices, sports scores, weather)
- Fact-checking and verification
- Questions about recent developments
- Information beyond model's training cutoff
When NOT to Use:
- General knowledge questions
- Mathematical calculations
- Code generation
- Creative writing
- Tasks requiring internal reasoning only
Cost Considerations:
- Grounding adds latency (search takes time)
- Additional token costs for retrieved content
- Use
dynamicThresholdto control when searches happen (Gemini 1.5)
Important Notes:
- Grounding requires Google Cloud project (not just API key)
- Search results quality depends on query phrasing
- Citations may not cover all facts in response
- Search is performed automatically based on confidence
Gemini 2.5 vs 1.5:
- Gemini 2.5: Use
googleSearch(simple, recommended) - Gemini 1.5: Use
googleSearchRetrievalwithdynamicThreshold
Best Practices:
- Always check
groundingMetadatato see if search was used - Display citations to users for transparency
- Use specific, well-phrased questions for better search results
- Combine with function calling for hybrid workflows
---
Error Handling
Common Errors
#### 1. Invalid API Key (401)
```typescript
{
error: {
code: 401,
message: 'API key not valid. Please pass a valid API key.',
status: 'UNAUTHENTICATED'
}
}
```
Solution: Verify GEMINI_API_KEY environment variable is set correctly.
#### 2. Rate Limit Exceeded (429)
```typescript
{
error: {
code: 429,
message: 'Resource has been exhausted (e.g. check quota).',
status: 'RESOURCE_EXHAUSTED'
}
}
```
Solution: Implement exponential backoff retry strategy.
#### 3. Model Not Found (404)
```typescript
{
error: {
code: 404,
message: 'models/gemini-3.0-flash is not found',
status: 'NOT_FOUND'
}
}
```
Solution: Use correct model names: gemini-2.5-pro, gemini-2.5-flash, gemini-2.5-flash-lite
#### 4. Context Length Exceeded (400)
```typescript
{
error: {
code: 400,
message: 'Request payload size exceeds the limit',
status: 'INVALID_ARGUMENT'
}
}
```
Solution: Reduce input size. Gemini 2.5 models support 1,048,576 input tokens max.
Exponential Backoff Pattern
```typescript
async function generateWithRetry(request, maxRetries = 3) {
for (let i = 0; i < maxRetries; i++) {
try {
return await ai.models.generateContent(request);
} catch (error) {
if (error.status === 429 && i < maxRetries - 1) {
const delay = Math.pow(2, i) * 1000; // 1s, 2s, 4s
await new Promise(resolve => setTimeout(resolve, delay));
continue;
}
throw error;
}
}
}
```
---
Rate Limits
Free Tier (Gemini API)
Rate limits vary by model:
Gemini 2.5 Pro:
- Requests per minute: 5 RPM
- Tokens per minute: 125,000 TPM
- Requests per day: 100 RPD
Gemini 2.5 Flash:
- Requests per minute: 10 RPM
- Tokens per minute: 250,000 TPM
- Requests per day: 250 RPD
Gemini 2.5 Flash-Lite:
- Requests per minute: 15 RPM
- Tokens per minute: 250,000 TPM
- Requests per day: 1,000 RPD
Paid Tier (Tier 1)
Requires billing account linked to your Google Cloud project.
Gemini 2.5 Pro:
- Requests per minute: 150 RPM
- Tokens per minute: 2,000,000 TPM
- Requests per day: 10,000 RPD
Gemini 2.5 Flash:
- Requests per minute: 1,000 RPM
- Tokens per minute: 1,000,000 TPM
- Requests per day: 10,000 RPD
Gemini 2.5 Flash-Lite:
- Requests per minute: 4,000 RPM
- Tokens per minute: 4,000,000 TPM
- Requests per day: Not specified
Higher Tiers (Tier 2 & 3)
Tier 2 (requires $250+ spending and 30-day wait):
- Even higher limits available
Tier 3 (requires $1,000+ spending and 30-day wait):
- Maximum limits available
Tips:
- Implement rate limit handling with exponential backoff
- Use batch processing for high-volume tasks
- Monitor usage in Google AI Studio
- Choose the right model based on your rate limit needs
- Official rate limits: https://ai.google.dev/gemini-api/docs/rate-limits
---
SDK Migration Guide
From @google/generative-ai to @google/genai
#### 1. Update Package
```bash
# Remove deprecated SDK
npm uninstall @google/generative-ai
# Install current SDK
npm install @google/genai@1.27.0
```
#### 2. Update Imports
Old (DEPRECATED):
```typescript
import { GoogleGenerativeAI } from '@google/generative-ai';
const genAI = new GoogleGenerativeAI(apiKey);
const model = genAI.getGenerativeModel({ model: 'gemini-2.5-flash' });
```
New (CURRENT):
```typescript
import { GoogleGenAI } from '@google/genai';
const ai = new GoogleGenAI({ apiKey });
// Use ai.models.generateContent() directly
```
#### 3. Update API Calls
Old:
```typescript
const result = await model.
More from this repository10
nextjs-shadcn-builder skill from ovachiever/droid-tings
security-auditor skill from ovachiever/droid-tings
threejs-graphics-optimizer skill from ovachiever/droid-tings
api-documenter skill from ovachiever/droid-tings
readme-updater skill from ovachiever/droid-tings
secret-scanner skill from ovachiever/droid-tings
applying-brand-guidelines skill from ovachiever/droid-tings
Configures Tailwind v4 with shadcn/ui, automating CSS variable setup, dark mode, and preventing common initialization errors.
deep-reading-analyst skill from ovachiever/droid-tings
dependency-auditor skill from ovachiever/droid-tings