🎯

venice-ai-api

🎯Skill

from jrajasekera/claude-skills

VibeIndex|
What it does

Enables privacy-first AI interactions using Venice.ai's uncensored, OpenAI-compatible API with decentralized inference across multiple advanced language models.

πŸ“¦

Part of

jrajasekera/claude-skills(7 items)

venice-ai-api

Installation

πŸ“‹ No install commands found in docs. Showing default command. Check GitHub for actual instructions.
Quick InstallInstall with npx
npx skills add jrajasekera/claude-skills --skill venice-ai-api
1Installs
-
AddedFeb 4, 2026

Skill Details

SKILL.md

Venice.ai API integration for privacy-first AI applications. Use when building applications with Venice.ai API for chat completions, image generation, video generation, text-to-speech, speech-to-text, or embeddings. Triggers on Venice, Venice.ai, uncensored AI, privacy-first AI, or when users need OpenAI-compatible API with uncensored models.

Overview

# Venice.ai API Skill

Venice.ai provides privacy-first AI infrastructure with uncensored models and zero data retention. The API is OpenAI-compatible, allowing use of the OpenAI SDK with Venice's base URL. Inference runs on a decentralized network (DePIN) where nodes are disincentivized from retaining user data.

Quick Reference

Base URL: https://api.venice.ai/api/v1

Auth: Authorization: Bearer VENICE_API_KEY

SDK: Use OpenAI SDK with custom base URL

API Key Types: ADMIN (full access) or INFERENCE (inference only)

Setup

```python

from openai import OpenAI

client = OpenAI(

api_key=os.getenv("VENICE_API_KEY"),

base_url="https://api.venice.ai/api/v1"

)

```

```javascript

import OpenAI from 'openai';

const client = new OpenAI({

apiKey: process.env.VENICE_API_KEY,

baseURL: 'https://api.venice.ai/api/v1'

});

```

Account Tiers

| Tier | Qualification | Rate Limits | Use Case |

|------|--------------|-------------|----------|

| Explorer | Pro subscription | Low RPM/TPM (~15-25 req/day) | Testing, prototyping |

| Paid | USD balance or staked VVV (Diems) | Standard production limits | Commercial apps |

| Partner | Enterprise agreement | Custom high-volume | Enterprise SaaS |

API Capabilities

1. Chat Completions

Text inference with multimodal support (text, images, audio, video).

```python

completion = client.chat.completions.create(

model="llama-3.3-70b",

messages=[

{"role": "system", "content": "You are a helpful assistant"},

{"role": "user", "content": "Hello!"}

]

)

```

Popular Models:

  • llama-3.3-70b - Balanced performance (Tier M, 128K context)
  • zai-org-glm-4.7 - Complex tasks, deep reasoning (Tier L, 128K context)
  • mistral-31-24b - Vision + function calling (Tier S, 131K context)
  • venice-uncensored - No content filtering (Tier S, 32K context)
  • deepseek-ai-DeepSeek-R1 - Advanced reasoning, math, coding (Tier L, 64K context)
  • qwen3-235b - Massive MoE reasoning (Tier L)
  • qwen3-4b - Fast, lightweight (Tier XS, 40K context)

Venice Parameters (via extra_body in Python, direct in JS):

  • enable_web_search: "off" | "on" | "auto"
  • enable_web_scraping: boolean
  • enable_web_citations: boolean β€” adds ^index^ citation format
  • include_venice_system_prompt: boolean (default: true)
  • strip_thinking_response: boolean
  • disable_thinking: boolean
  • character_slug: string
  • prompt_cache_key: string β€” routing hint for cache hits
  • prompt_cache_retention: "default" | "extended" | "24h"

See [references/chat-completions.md](references/chat-completions.md) for full parameter reference.

2. Image Generation

Generate images from text prompts.

```python

import requests

response = requests.post(

"https://api.venice.ai/api/v1/image/generate",

headers={"Authorization": f"Bearer {os.getenv('VENICE_API_KEY')}"},

json={

"model": "venice-sd35",

"prompt": "A sunset over mountains",

"width": 1024,

"height": 1024

}

)

# Response contains base64 images in images array

```

Image Models:

| Model | Best For | Pricing |

|-------|----------|---------|

| qwen-image | Highest quality, editing | Variable |

| venice-sd35 | General purpose (default) | ~$0.01/image |

| hidream | Fast generation | ~$0.01/image |

| flux-2-pro | Professional quality | ~$0.04/image |

| flux-2-max | High-quality output | ~$0.02/image |

| nano-banana-pro | Photorealism, 2K/4K support | $0.18-$0.35 |

3. Image Upscaling

Enhance image resolution 2x or 4x.

```python

import base64

with open("image.jpg", "rb") as f:

image_base64 = base64.b64encode(f.read()).decode("utf-8")

response = requests.post(

"https://api.venice.ai/api/v1/image/upscale",

headers={"Authorization": f"Bearer {api_key}"},

json={

"image": image_base64,

"scale": 4 # 2 or 4

}

)

# Returns raw image binary

with open("upscaled.png", "wb") as f:

f.write(response.content)

```

Pricing: $0.02 (2x), $0.08 (4x)

4. Image Editing (Inpainting)

Modify existing images with AI-powered instructions.

```python

import base64

with open("photo.jpg", "rb") as f:

image_base64 = base64.b64encode(f.read()).decode("utf-8")

response = requests.post(

"https://api.venice.ai/api/v1/image/edit",

headers={"Authorization": f"Bearer {api_key}"},

json={

"prompt": "Change the sky to a sunset",

"image": image_base64 # or URL starting with http/https

}

)

# Returns raw image binary

with open("edited.png", "wb") as f:

f.write(response.content)

```

Model: Uses Qwen-Image. Pricing: ~$0.04/edit.

See [references/image-api.md](references/image-api.md) for all parameters and style presets.

5. Video Generation

Async queue-based video generation. Always call /video/quote first for pricing.

Full Workflow:

```python

import requests

import time

import base64

api_key = os.getenv("VENICE_API_KEY")

headers = {

"Authorization": f"Bearer {api_key}",

"Content-Type": "application/json"

}

# Step 1: Get price quote

quote = requests.post(

"https://api.venice.ai/api/v1/video/quote",

headers=headers,

json={

"model": "kling-2.5-turbo-pro-text-to-video",

"duration": "10s",

"resolution": "720p",

"aspect_ratio": "16:9",

"audio": True

}

)

print(f"Estimated cost: ${quote.json()['quote']}")

# Step 2: Queue the job (text-to-video)

queue_resp = requests.post(

"https://api.venice.ai/api/v1/video/queue",

headers=headers,

json={

"model": "kling-2.5-turbo-pro-text-to-video",

"prompt": "A serene forest with sunlight filtering through trees",

"negative_prompt": "low quality, blurry",

"duration": "10s",

"resolution": "720p",

"aspect_ratio": "16:9",

"audio": True

}

)

queue_id = queue_resp.json()["queueid"]

# Step 3: Poll until complete

while True:

status_resp = requests.post(

"https://api.venice.ai/api/v1/video/retrieve",

headers=headers,

json={

"model": "kling-2.5-turbo-pro-text-to-video",

"queueid": queue_id,

"delete_media_on_completion": False

}

)

if (status_resp.status_code == 200

and status_resp.headers.get("Content-Type") == "video/mp4"):

with open("output.mp4", "wb") as f:

f.write(status_resp.content)

print("Video saved!")

break

else:

status = status_resp.json()

print(f"Status: {status['status']}, Duration: {status['executionDuration']}ms")

time.sleep(10)

# Step 4: Cleanup (optional β€” deletes from Venice storage)

requests.post(

"https://api.venice.ai/api/v1/video/complete",

headers=headers,

json={

"model": "kling-2.5-turbo-pro-text-to-video",

"queueid": queue_id

}

)

```

Image-to-Video:

```python

with open("image.png", "rb") as f:

img_b64 = base64.b64encode(f.read()).decode("utf-8")

queue_resp = requests.post(

"https://api.venice.ai/api/v1/video/queue",

headers=headers,

json={

"model": "wan-2.5-preview-image-to-video",

"prompt": "Animate this scene with gentle motion",

"image_url": f"data:image/png;base64,{img_b64}",

"duration": "5s",

"resolution": "720p"

}

)

```

Video Models:

| Model | Type | Features |

|-------|------|----------|

| kling-2.5-turbo-pro | Text/Image-to-Video | Fast, high quality |

| wan-2.5-preview | Image-to-Video | Animation specialist |

| ltx-2-full | Text/Image-to-Video | Full quality |

| veo3-fast | Text/Image-to-Video | Speed-optimized |

| sora-2 | Image-to-Video | High-end quality |

See [references/video-api.md](references/video-api.md) for full parameter reference.

6. Text-to-Speech

Convert text to audio with 60+ voices.

```python

response = requests.post(

"https://api.venice.ai/api/v1/audio/speech",

headers={"Authorization": f"Bearer {api_key}"},

json={

"input": "Hello, welcome to Venice.",

"model": "tts-kokoro",

"voice": "af_sky",

"speed": 1.0, # 0.25 to 4.0

"response_format": "mp3" # mp3, opus, aac, flac, wav, pcm

}

)

with open("speech.mp3", "wb") as f:

f.write(response.content)

```

Voices: af_sky, af_nova, am_liam, bf_emma, zf_xiaobei, jm_kumo, and 50+ more.

Pricing: $3.50 per 1M characters.

7. Speech-to-Text

Transcribe audio files.

```python

with open("audio.mp3", "rb") as f:

response = requests.post(

"https://api.venice.ai/api/v1/audio/transcriptions",

headers={"Authorization": f"Bearer {api_key}"},

files={"file": f},

data={

"model": "nvidia/parakeet-tdt-0.6b-v3",

"response_format": "json", # json or text

"timestamps": "true"

}

)

```

Formats: WAV, FLAC, MP3, M4A, AAC, MP4.

Pricing: $0.0001 per audio second.

8. Embeddings

Generate vector embeddings for RAG and semantic search.

```python

response = requests.post(

"https://api.venice.ai/api/v1/embeddings",

headers={"Authorization": f"Bearer {api_key}"},

json={

"model": "text-embedding-bge-m3",

"input": "Privacy-first AI infrastructure",

"encoding_format": "float" # or "base64"

}

)

```

9. Vision (Multimodal)

Analyze images with vision-capable models.

```python

response = client.chat.completions.create(

model="mistral-31-24b",

messages=[{

"role": "user",

"content": [

{"type": "text", "text": "What is in this image?"},

{"type": "image_url", "image_url": {"url": "https://..."}}

]

}]

)

```

10. Function Calling

Define tools for the model to call.

```python

tools = [{

"type": "function",

"function": {

"name": "get_weather",

"description": "Get current weather",

"parameters": {

"type": "object",

"properties": {"location": {"type": "string"}},

"required": ["location"]

}

}

}]

response = client.chat.completions.create(

model="zai-org-glm-4.7",

messages=[{"role": "user", "content": "Weather in SF?"}],

tools=tools

)

```

11. Structured Outputs

Get guaranteed JSON schema responses.

```python

response = client.chat.completions.create(

model="venice-uncensored",

messages=[...],

response_format={

"type": "json_schema",

"json_schema": {

"name": "my_response",

"strict": True,

"schema": {

"type": "object",

"properties": {"answer": {"type": "string"}},

"required": ["answer"],

"additionalProperties": False

}

}

}

)

```

Requirements: strict: true, additionalProperties: false, all fields in required.

12. AI Characters

Interact with predefined AI personas.

```python

# List characters

characters = requests.get(

"https://api.venice.ai/api/v1/characters",

headers={"Authorization": f"Bearer {api_key}"},

params={"categories": "philosophy", "limit": 50}

).json()

# Chat with a character

response = client.chat.completions.create(

model="venice-uncensored",

messages=[{"role": "user", "content": "What is the meaning of life?"}],

extra_body={

"venice_parameters": {"character_slug": "alan-watts"}

}

)

```

13. Model Discovery

Query available models and capabilities programmatically.

```python

# List models by type

models = requests.get(

"https://api.venice.ai/api/v1/models",

headers={"Authorization": f"Bearer {api_key}"},

params={"type": "text"} # text, image, audio, video, embedding

).json()

# Get model traits for auto-selection

traits = requests.get(

"https://api.venice.ai/api/v1/models/traits",

params={"type": "text"}

).json()

# e.g. {"default": "zai-org-glm-4.7", "fastest": "qwen3-4b", "uncensored": "venice-uncensored"}

# Use trait as model ID for automatic routing

response = client.chat.completions.create(

model="fastest", # Venice routes to the current fastest model

messages=[...]

)

```

Error Handling

Error Codes

| Status | Error Code | Meaning | Action |

|--------|------------|---------|--------|

| 400 | INVALID_REQUEST | Bad parameters | Check payload schema |

| 401 | AUTHENTICATION_FAILED | Invalid API key | Verify key and balance |

| 402 | β€” | Insufficient balance | Add USD or stake VVV |

| 403 | β€” | Unauthorized access | Check key type (ADMIN vs INFERENCE) |

| 413 | β€” | Payload too large | Reduce request size |

| 415 | β€” | Invalid content type | Use application/json |

| 422 | β€” | Content policy violation | Modify prompt |

| 429 | RATE_LIMIT_EXCEEDED | Too many requests | Backoff, wait for reset |

| 500 | INFERENCE_FAILED | Model error | Retry with backoff |

| 503 | β€” | Model at capacity | Retry later or switch model |

| 504 | β€” | Timeout | Use streaming for long responses |

Abuse Protection

Sending >20 failed requests in 30 seconds triggers a 30-second IP block. Always implement backoff.

Retry with Exponential Backoff (Python)

```python

import time

import requests

from requests.adapters import HTTPAdapter

from urllib3.util.retry import Retry

def create_venice_session():

"""Create a requests session with automatic retry and backoff."""

session = requests.Session()

retry = Retry(

total=3,

backoff_factor=1, # 1s, 2s, 4s

status_forcelist=[429, 500, 502, 503, 504],

allowed_methods=["POST", "GET"]

)

adapter = HTTPAdapter(max_retries=retry)

session.mount("https://", adapter)

return session

session = create_venice_session()

response = session.post(url, json=payload, headers=headers)

```

Retry with Exponential Backoff (JavaScript)

```javascript

async function veniceRequest(url, options, maxRetries = 3) {

for (let attempt = 0; attempt <= maxRetries; attempt++) {

const response = await fetch(url, options);

if (response.ok) return response;

if ([429, 500, 502, 503, 504].includes(response.status)) {

if (attempt < maxRetries) {

const delay = Math.pow(2, attempt) * 1000;

console.log(Retry ${attempt + 1} in ${delay}ms (status ${response.status}));

await new Promise(r => setTimeout(r, delay));

continue;

}

}

throw new Error(Venice API error: ${response.status} ${response.statusText});

}

}

```

Rate Limit-Aware Client (Python)

```python

import time

import requests

class VeniceClient:

"""Wrapper that respects rate limits using response headers."""

def __init__(self, api_key):

self.api_key = api_key

self.base_url = "https://api.venice.ai/api/v1"

self.session = create_venice_session()

self.headers = {

"Authorization": f"Bearer {api_key}",

"Content-Type": "application/json"

}

def request(self, method, path, **kwargs):

resp = self.session.request(

method, f"{self.base_url}{path}",

headers=self.headers, **kwargs

)

remaining = resp.headers.get("x-ratelimit-remaining-requests")

if remaining and int(remaining) <= 1:

reset = resp.headers.get("x-ratelimit-reset-requests")

if reset:

wait = max(0, float(reset) - time.time())

time.sleep(wait)

resp.raise_for_status()

return resp

```

Response Headers

Monitor these headers for production:

  • x-ratelimit-remaining-requests β€” Requests left in window
  • x-ratelimit-remaining-tokens β€” Tokens left in window
  • x-ratelimit-reset-requests β€” Timestamp when request count resets
  • x-venice-balance-usd β€” USD balance
  • x-venice-balance-diem β€” DIEM balance
  • x-venice-is-blurred β€” Image was blurred (safe mode)
  • x-venice-is-content-violation β€” Content policy violation
  • x-venice-model-deprecation-warning β€” Deprecation notice
  • x-venice-model-deprecation-date β€” Sunset date
  • CF-RAY β€” Request ID for support

Rate Limits by Model Tier

Text Models:

| Tier | RPM | TPM | Example Models |

|------|-----|-----|----------------|

| XS | 500 | 1,000,000 | qwen3-4b, llama-3.2-3b |

| S | 75 | 750,000 | mistral-31-24b, venice-uncensored |

| M | 50 | 750,000 | llama-3.3-70b, qwen3-next-80b |

| L | 20 | 500,000 | zai-org-glm-4.7, deepseek-ai-DeepSeek-R1 |

Other Endpoints:

| Endpoint | RPM |

|----------|-----|

| Image Generation | 20 |

| Audio Synthesis | 60 |

| Audio Transcription | 60 |

| Embeddings | 500 |

| Video Queue | 40 |

| Video Retrieve | 120 |

API Key Management

```bash

# Create key programmatically (requires ADMIN key)

curl -X POST https://api.venice.ai/api/v1/api_keys \

-H "Authorization: Bearer $VENICE_API_KEY" \

-H "Content-Type: application/json" \

-d '{"apiKeyType": "INFERENCE", "description": "My App", "consumptionLimit": {"usd": 100}}'

# Check rate limits and balance

curl https://api.venice.ai/api/v1/api_keys/rate_limits \

-H "Authorization: Bearer $VENICE_API_KEY"

# List keys

curl https://api.venice.ai/api/v1/api_keys \

-H "Authorization: Bearer $VENICE_API_KEY"

# Delete key

curl -X DELETE "https://api.venice.ai/api/v1/api_keys?id={key_id}" \

-H "Authorization: Bearer $VENICE_API_KEY"

```

Reference Files

  • [references/chat-completions.md](references/chat-completions.md) β€” Full chat API parameters
  • [references/image-api.md](references/image-api.md) β€” Image generation, editing, upscaling details
  • [references/video-api.md](references/video-api.md) β€” Video generation workflow and parameters
  • [references/models.md](references/models.md) β€” Available models, tiers, pricing, and capabilities