venice-ai-api
π―Skillfrom jrajasekera/claude-skills
Enables privacy-first AI interactions using Venice.ai's uncensored, OpenAI-compatible API with decentralized inference across multiple advanced language models.
Part of
jrajasekera/claude-skills(7 items)
Installation
npx skills add jrajasekera/claude-skills --skill venice-ai-apiSkill Details
Venice.ai API integration for privacy-first AI applications. Use when building applications with Venice.ai API for chat completions, image generation, video generation, text-to-speech, speech-to-text, or embeddings. Triggers on Venice, Venice.ai, uncensored AI, privacy-first AI, or when users need OpenAI-compatible API with uncensored models.
Overview
# Venice.ai API Skill
Venice.ai provides privacy-first AI infrastructure with uncensored models and zero data retention. The API is OpenAI-compatible, allowing use of the OpenAI SDK with Venice's base URL. Inference runs on a decentralized network (DePIN) where nodes are disincentivized from retaining user data.
Quick Reference
Base URL: https://api.venice.ai/api/v1
Auth: Authorization: Bearer VENICE_API_KEY
SDK: Use OpenAI SDK with custom base URL
API Key Types: ADMIN (full access) or INFERENCE (inference only)
Setup
```python
from openai import OpenAI
client = OpenAI(
api_key=os.getenv("VENICE_API_KEY"),
base_url="https://api.venice.ai/api/v1"
)
```
```javascript
import OpenAI from 'openai';
const client = new OpenAI({
apiKey: process.env.VENICE_API_KEY,
baseURL: 'https://api.venice.ai/api/v1'
});
```
Account Tiers
| Tier | Qualification | Rate Limits | Use Case |
|------|--------------|-------------|----------|
| Explorer | Pro subscription | Low RPM/TPM (~15-25 req/day) | Testing, prototyping |
| Paid | USD balance or staked VVV (Diems) | Standard production limits | Commercial apps |
| Partner | Enterprise agreement | Custom high-volume | Enterprise SaaS |
API Capabilities
1. Chat Completions
Text inference with multimodal support (text, images, audio, video).
```python
completion = client.chat.completions.create(
model="llama-3.3-70b",
messages=[
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "Hello!"}
]
)
```
Popular Models:
llama-3.3-70b- Balanced performance (Tier M, 128K context)zai-org-glm-4.7- Complex tasks, deep reasoning (Tier L, 128K context)mistral-31-24b- Vision + function calling (Tier S, 131K context)venice-uncensored- No content filtering (Tier S, 32K context)deepseek-ai-DeepSeek-R1- Advanced reasoning, math, coding (Tier L, 64K context)qwen3-235b- Massive MoE reasoning (Tier L)qwen3-4b- Fast, lightweight (Tier XS, 40K context)
Venice Parameters (via extra_body in Python, direct in JS):
enable_web_search: "off" | "on" | "auto"enable_web_scraping: booleanenable_web_citations: boolean β adds^index^citation formatinclude_venice_system_prompt: boolean (default: true)strip_thinking_response: booleandisable_thinking: booleancharacter_slug: stringprompt_cache_key: string β routing hint for cache hitsprompt_cache_retention: "default" | "extended" | "24h"
See [references/chat-completions.md](references/chat-completions.md) for full parameter reference.
2. Image Generation
Generate images from text prompts.
```python
import requests
response = requests.post(
"https://api.venice.ai/api/v1/image/generate",
headers={"Authorization": f"Bearer {os.getenv('VENICE_API_KEY')}"},
json={
"model": "venice-sd35",
"prompt": "A sunset over mountains",
"width": 1024,
"height": 1024
}
)
# Response contains base64 images in images array
```
Image Models:
| Model | Best For | Pricing |
|-------|----------|---------|
| qwen-image | Highest quality, editing | Variable |
| venice-sd35 | General purpose (default) | ~$0.01/image |
| hidream | Fast generation | ~$0.01/image |
| flux-2-pro | Professional quality | ~$0.04/image |
| flux-2-max | High-quality output | ~$0.02/image |
| nano-banana-pro | Photorealism, 2K/4K support | $0.18-$0.35 |
3. Image Upscaling
Enhance image resolution 2x or 4x.
```python
import base64
with open("image.jpg", "rb") as f:
image_base64 = base64.b64encode(f.read()).decode("utf-8")
response = requests.post(
"https://api.venice.ai/api/v1/image/upscale",
headers={"Authorization": f"Bearer {api_key}"},
json={
"image": image_base64,
"scale": 4 # 2 or 4
}
)
# Returns raw image binary
with open("upscaled.png", "wb") as f:
f.write(response.content)
```
Pricing: $0.02 (2x), $0.08 (4x)
4. Image Editing (Inpainting)
Modify existing images with AI-powered instructions.
```python
import base64
with open("photo.jpg", "rb") as f:
image_base64 = base64.b64encode(f.read()).decode("utf-8")
response = requests.post(
"https://api.venice.ai/api/v1/image/edit",
headers={"Authorization": f"Bearer {api_key}"},
json={
"prompt": "Change the sky to a sunset",
"image": image_base64 # or URL starting with http/https
}
)
# Returns raw image binary
with open("edited.png", "wb") as f:
f.write(response.content)
```
Model: Uses Qwen-Image. Pricing: ~$0.04/edit.
See [references/image-api.md](references/image-api.md) for all parameters and style presets.
5. Video Generation
Async queue-based video generation. Always call /video/quote first for pricing.
Full Workflow:
```python
import requests
import time
import base64
api_key = os.getenv("VENICE_API_KEY")
headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
}
# Step 1: Get price quote
quote = requests.post(
"https://api.venice.ai/api/v1/video/quote",
headers=headers,
json={
"model": "kling-2.5-turbo-pro-text-to-video",
"duration": "10s",
"resolution": "720p",
"aspect_ratio": "16:9",
"audio": True
}
)
print(f"Estimated cost: ${quote.json()['quote']}")
# Step 2: Queue the job (text-to-video)
queue_resp = requests.post(
"https://api.venice.ai/api/v1/video/queue",
headers=headers,
json={
"model": "kling-2.5-turbo-pro-text-to-video",
"prompt": "A serene forest with sunlight filtering through trees",
"negative_prompt": "low quality, blurry",
"duration": "10s",
"resolution": "720p",
"aspect_ratio": "16:9",
"audio": True
}
)
queue_id = queue_resp.json()["queueid"]
# Step 3: Poll until complete
while True:
status_resp = requests.post(
"https://api.venice.ai/api/v1/video/retrieve",
headers=headers,
json={
"model": "kling-2.5-turbo-pro-text-to-video",
"queueid": queue_id,
"delete_media_on_completion": False
}
)
if (status_resp.status_code == 200
and status_resp.headers.get("Content-Type") == "video/mp4"):
with open("output.mp4", "wb") as f:
f.write(status_resp.content)
print("Video saved!")
break
else:
status = status_resp.json()
print(f"Status: {status['status']}, Duration: {status['executionDuration']}ms")
time.sleep(10)
# Step 4: Cleanup (optional β deletes from Venice storage)
requests.post(
"https://api.venice.ai/api/v1/video/complete",
headers=headers,
json={
"model": "kling-2.5-turbo-pro-text-to-video",
"queueid": queue_id
}
)
```
Image-to-Video:
```python
with open("image.png", "rb") as f:
img_b64 = base64.b64encode(f.read()).decode("utf-8")
queue_resp = requests.post(
"https://api.venice.ai/api/v1/video/queue",
headers=headers,
json={
"model": "wan-2.5-preview-image-to-video",
"prompt": "Animate this scene with gentle motion",
"image_url": f"data:image/png;base64,{img_b64}",
"duration": "5s",
"resolution": "720p"
}
)
```
Video Models:
| Model | Type | Features |
|-------|------|----------|
| kling-2.5-turbo-pro | Text/Image-to-Video | Fast, high quality |
| wan-2.5-preview | Image-to-Video | Animation specialist |
| ltx-2-full | Text/Image-to-Video | Full quality |
| veo3-fast | Text/Image-to-Video | Speed-optimized |
| sora-2 | Image-to-Video | High-end quality |
See [references/video-api.md](references/video-api.md) for full parameter reference.
6. Text-to-Speech
Convert text to audio with 60+ voices.
```python
response = requests.post(
"https://api.venice.ai/api/v1/audio/speech",
headers={"Authorization": f"Bearer {api_key}"},
json={
"input": "Hello, welcome to Venice.",
"model": "tts-kokoro",
"voice": "af_sky",
"speed": 1.0, # 0.25 to 4.0
"response_format": "mp3" # mp3, opus, aac, flac, wav, pcm
}
)
with open("speech.mp3", "wb") as f:
f.write(response.content)
```
Voices: af_sky, af_nova, am_liam, bf_emma, zf_xiaobei, jm_kumo, and 50+ more.
Pricing: $3.50 per 1M characters.
7. Speech-to-Text
Transcribe audio files.
```python
with open("audio.mp3", "rb") as f:
response = requests.post(
"https://api.venice.ai/api/v1/audio/transcriptions",
headers={"Authorization": f"Bearer {api_key}"},
files={"file": f},
data={
"model": "nvidia/parakeet-tdt-0.6b-v3",
"response_format": "json", # json or text
"timestamps": "true"
}
)
```
Formats: WAV, FLAC, MP3, M4A, AAC, MP4.
Pricing: $0.0001 per audio second.
8. Embeddings
Generate vector embeddings for RAG and semantic search.
```python
response = requests.post(
"https://api.venice.ai/api/v1/embeddings",
headers={"Authorization": f"Bearer {api_key}"},
json={
"model": "text-embedding-bge-m3",
"input": "Privacy-first AI infrastructure",
"encoding_format": "float" # or "base64"
}
)
```
9. Vision (Multimodal)
Analyze images with vision-capable models.
```python
response = client.chat.completions.create(
model="mistral-31-24b",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "What is in this image?"},
{"type": "image_url", "image_url": {"url": "https://..."}}
]
}]
)
```
10. Function Calling
Define tools for the model to call.
```python
tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather",
"parameters": {
"type": "object",
"properties": {"location": {"type": "string"}},
"required": ["location"]
}
}
}]
response = client.chat.completions.create(
model="zai-org-glm-4.7",
messages=[{"role": "user", "content": "Weather in SF?"}],
tools=tools
)
```
11. Structured Outputs
Get guaranteed JSON schema responses.
```python
response = client.chat.completions.create(
model="venice-uncensored",
messages=[...],
response_format={
"type": "json_schema",
"json_schema": {
"name": "my_response",
"strict": True,
"schema": {
"type": "object",
"properties": {"answer": {"type": "string"}},
"required": ["answer"],
"additionalProperties": False
}
}
}
)
```
Requirements: strict: true, additionalProperties: false, all fields in required.
12. AI Characters
Interact with predefined AI personas.
```python
# List characters
characters = requests.get(
"https://api.venice.ai/api/v1/characters",
headers={"Authorization": f"Bearer {api_key}"},
params={"categories": "philosophy", "limit": 50}
).json()
# Chat with a character
response = client.chat.completions.create(
model="venice-uncensored",
messages=[{"role": "user", "content": "What is the meaning of life?"}],
extra_body={
"venice_parameters": {"character_slug": "alan-watts"}
}
)
```
13. Model Discovery
Query available models and capabilities programmatically.
```python
# List models by type
models = requests.get(
"https://api.venice.ai/api/v1/models",
headers={"Authorization": f"Bearer {api_key}"},
params={"type": "text"} # text, image, audio, video, embedding
).json()
# Get model traits for auto-selection
traits = requests.get(
"https://api.venice.ai/api/v1/models/traits",
params={"type": "text"}
).json()
# e.g. {"default": "zai-org-glm-4.7", "fastest": "qwen3-4b", "uncensored": "venice-uncensored"}
# Use trait as model ID for automatic routing
response = client.chat.completions.create(
model="fastest", # Venice routes to the current fastest model
messages=[...]
)
```
Error Handling
Error Codes
| Status | Error Code | Meaning | Action |
|--------|------------|---------|--------|
| 400 | INVALID_REQUEST | Bad parameters | Check payload schema |
| 401 | AUTHENTICATION_FAILED | Invalid API key | Verify key and balance |
| 402 | β | Insufficient balance | Add USD or stake VVV |
| 403 | β | Unauthorized access | Check key type (ADMIN vs INFERENCE) |
| 413 | β | Payload too large | Reduce request size |
| 415 | β | Invalid content type | Use application/json |
| 422 | β | Content policy violation | Modify prompt |
| 429 | RATE_LIMIT_EXCEEDED | Too many requests | Backoff, wait for reset |
| 500 | INFERENCE_FAILED | Model error | Retry with backoff |
| 503 | β | Model at capacity | Retry later or switch model |
| 504 | β | Timeout | Use streaming for long responses |
Abuse Protection
Sending >20 failed requests in 30 seconds triggers a 30-second IP block. Always implement backoff.
Retry with Exponential Backoff (Python)
```python
import time
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
def create_venice_session():
"""Create a requests session with automatic retry and backoff."""
session = requests.Session()
retry = Retry(
total=3,
backoff_factor=1, # 1s, 2s, 4s
status_forcelist=[429, 500, 502, 503, 504],
allowed_methods=["POST", "GET"]
)
adapter = HTTPAdapter(max_retries=retry)
session.mount("https://", adapter)
return session
session = create_venice_session()
response = session.post(url, json=payload, headers=headers)
```
Retry with Exponential Backoff (JavaScript)
```javascript
async function veniceRequest(url, options, maxRetries = 3) {
for (let attempt = 0; attempt <= maxRetries; attempt++) {
const response = await fetch(url, options);
if (response.ok) return response;
if ([429, 500, 502, 503, 504].includes(response.status)) {
if (attempt < maxRetries) {
const delay = Math.pow(2, attempt) * 1000;
console.log(Retry ${attempt + 1} in ${delay}ms (status ${response.status}));
await new Promise(r => setTimeout(r, delay));
continue;
}
}
throw new Error(Venice API error: ${response.status} ${response.statusText});
}
}
```
Rate Limit-Aware Client (Python)
```python
import time
import requests
class VeniceClient:
"""Wrapper that respects rate limits using response headers."""
def __init__(self, api_key):
self.api_key = api_key
self.base_url = "https://api.venice.ai/api/v1"
self.session = create_venice_session()
self.headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
}
def request(self, method, path, **kwargs):
resp = self.session.request(
method, f"{self.base_url}{path}",
headers=self.headers, **kwargs
)
remaining = resp.headers.get("x-ratelimit-remaining-requests")
if remaining and int(remaining) <= 1:
reset = resp.headers.get("x-ratelimit-reset-requests")
if reset:
wait = max(0, float(reset) - time.time())
time.sleep(wait)
resp.raise_for_status()
return resp
```
Response Headers
Monitor these headers for production:
x-ratelimit-remaining-requestsβ Requests left in windowx-ratelimit-remaining-tokensβ Tokens left in windowx-ratelimit-reset-requestsβ Timestamp when request count resetsx-venice-balance-usdβ USD balancex-venice-balance-diemβ DIEM balancex-venice-is-blurredβ Image was blurred (safe mode)x-venice-is-content-violationβ Content policy violationx-venice-model-deprecation-warningβ Deprecation noticex-venice-model-deprecation-dateβ Sunset dateCF-RAYβ Request ID for support
Rate Limits by Model Tier
Text Models:
| Tier | RPM | TPM | Example Models |
|------|-----|-----|----------------|
| XS | 500 | 1,000,000 | qwen3-4b, llama-3.2-3b |
| S | 75 | 750,000 | mistral-31-24b, venice-uncensored |
| M | 50 | 750,000 | llama-3.3-70b, qwen3-next-80b |
| L | 20 | 500,000 | zai-org-glm-4.7, deepseek-ai-DeepSeek-R1 |
Other Endpoints:
| Endpoint | RPM |
|----------|-----|
| Image Generation | 20 |
| Audio Synthesis | 60 |
| Audio Transcription | 60 |
| Embeddings | 500 |
| Video Queue | 40 |
| Video Retrieve | 120 |
API Key Management
```bash
# Create key programmatically (requires ADMIN key)
curl -X POST https://api.venice.ai/api/v1/api_keys \
-H "Authorization: Bearer $VENICE_API_KEY" \
-H "Content-Type: application/json" \
-d '{"apiKeyType": "INFERENCE", "description": "My App", "consumptionLimit": {"usd": 100}}'
# Check rate limits and balance
curl https://api.venice.ai/api/v1/api_keys/rate_limits \
-H "Authorization: Bearer $VENICE_API_KEY"
# List keys
curl https://api.venice.ai/api/v1/api_keys \
-H "Authorization: Bearer $VENICE_API_KEY"
# Delete key
curl -X DELETE "https://api.venice.ai/api/v1/api_keys?id={key_id}" \
-H "Authorization: Bearer $VENICE_API_KEY"
```
Reference Files
- [references/chat-completions.md](references/chat-completions.md) β Full chat API parameters
- [references/image-api.md](references/image-api.md) β Image generation, editing, upscaling details
- [references/video-api.md](references/video-api.md) β Video generation workflow and parameters
- [references/models.md](references/models.md) β Available models, tiers, pricing, and capabilities
More from this repository6
Converts documents between multiple formats like markdown, docx, html, pdf, and more using Pandoc's powerful document conversion capabilities.
Optimizes SQLite database performance by configuring PRAGMAs, designing efficient schemas, creating strategic indexes, and tuning queries for faster execution.
Enables unified access to 400+ AI models from 70+ providers through a single, flexible OpenAI-compatible API endpoint.
Enables seamless integration with Z.ai's GLM models for advanced AI tasks like chat, vision, image/video generation, transcription, and agent workflows.
Extracts key information and structured content from web articles using web scraping and parsing techniques.
Reviews code repositories, providing detailed analysis and suggestions for improvements using Claude's AI capabilities.