🎯

elevenlabs

🎯Skill

from digitalsamba/claude-code-video-toolkit

What it does

Generates AI voiceovers and audio content using ElevenLabs APIs, supporting text-to-speech, voice cloning, and custom audio synthesis across multiple languages and styles.

📦

Part of

digitalsamba/claude-code-video-toolkit(6 items)

elevenlabs

Installation

PythonRun Python server

python tools/voiceover.py --scene-dir public/audio/scenes --json

📖 Extracted from docs: digitalsamba/claude-code-video-toolkit

Need more details? View full documentation on GitHub →

15Installs

AddedFeb 4, 2026

View on GitHub Back to Skills

Skill Details

SKILL.md

Generate AI voiceovers, sound effects, and music using ElevenLabs APIs. Use when creating audio content for videos, podcasts, or games. Triggers include generating voiceovers, narration, dialogue, sound effects from descriptions, background music, soundtrack generation, voice cloning, or any audio synthesis task.

Overview

# ElevenLabs Audio Generation

Requires ELEVENLABS_API_KEY in .env.

Text-to-Speech

```python

from elevenlabs.client import ElevenLabs

from elevenlabs import save, VoiceSettings

import os

client = ElevenLabs(api_key=os.getenv("ELEVENLABS_API_KEY"))

audio = client.text_to_speech.convert(

text="Welcome to my video!",

voice_id="JBFqnCBsd6RMkjVDRZzb",

model_id="eleven_multilingual_v2",

voice_settings=VoiceSettings(

stability=0.5,

similarity_boost=0.75,

style=0.5,

speed=1.0

)

save(audio, "voiceover.mp3")

```

Models

|-------|---------|--------------|-------|

Choose: multilingual_v2 for reliability, flash/turbo for SSML control, v3 for maximum expressiveness (expect retakes).

Voice Settings by Style

|-------|-----------|------------|-------|-------|

| Natural/professional | 0.75-0.85 | 0.9 | 0.0-0.1 | 1.0 |

| Conversational | 0.5-0.6 | 0.85 | 0.3-0.4 | 0.9-1.0 |

| Energetic/YouTuber | 0.3-0.5 | 0.75 | 0.5-0.7 | 1.0-1.1 |

Pauses Between Sections

With flash/turbo models: Use SSML break tags inline:

```

...end of section. Start of next...

```

Max 3 seconds per break. Excessive breaks can cause speed artifacts.

With multilingual_v2 / v3: No SSML support. Options:

Paragraph breaks (blank lines) — creates ~0.3-0.5s natural pause
Post-process with ffmpeg: split audio and insert silence

WARNING: ... (ellipsis) is NOT a reliable pause — it can be vocalized as a word/sound. Do not use ellipsis as a pause mechanism.

Pronunciation Control

Phonetic spelling (any model): Write words as you want them pronounced:

Janus → Jan-us
nginx → engine-x
Use dashes, capitals, apostrophes to guide pronunciation

SSML phoneme tags (flash/turbo only):

```

Janus

```

Iterative Workflow

Generate → listen → identify pronunciation/pacing issues
Adjust: phonetic spellings, break tags, voice settings
Regenerate. If pauses aren't precise enough, add silence in post with ffmpeg rather than fighting the TTS engine.

Voice Cloning

Instant Voice Clone

```python

with open("sample.mp3", "rb") as f:

voice = client.voices.ivc.create(

name="My Voice",

files=[f],

remove_background_noise=True

)

print(f"Voice ID: {voice.voice_id}")

```

Use client.voices.ivc.create() (not client.voices.clone())
Pass file handles in binary mode ("rb"), not paths
Convert m4a first: ffmpeg -i input.m4a -codec:a libmp3lame -qscale:a 2 output.mp3
Multiple samples (2-3 clips) improve accuracy
Save voice ID for reuse

Professional Voice Clone: Requires Creator plan+, 30+ min audio. See [reference.md](reference.md).

Sound Effects

Max 22 seconds per generation.

```python

result = client.text_to_sound_effects.convert(

text="Thunder rumbling followed by heavy rain",

duration_seconds=10,

prompt_influence=0.3

)

with open("thunder.mp3", "wb") as f:

for chunk in result:

f.write(chunk)

```

Prompt tips: Be specific — "Heavy footsteps on wooden floorboards, slow and deliberate, with creaking"

Music Generation

10 seconds to 5 minutes. Use client.music.compose() (not .generate()).

```python

result = client.music.compose(

prompt="Upbeat indie rock, catchy guitar riff, energetic drums, travel vlog",

music_length_ms=60000,

force_instrumental=True

)

with open("music.mp3", "wb") as f:

for chunk in result:

f.write(chunk)

```

Prompt structure: Genre, mood, instruments, tempo, use case. Add "no vocals" or use force_instrumental=True for background music.

Remotion Integration

Complete Workflow: Script to Synchronized Scene

```

VOICEOVER-SCRIPT.md → voiceover.py → public/audio/ → Remotion composition

↓ ↓ ↓ ↓

Scene narration Generate MP3 Audio files

with durations per scene with timing synced to scenes

```

Step 1: Generate Per-Scene Audio

Use the toolkit's voiceover tool to generate audio for each scene:

```bash

# Generate voiceover files for each scene

python tools/voiceover.py --scene-dir public/audio/scenes --json

# Output:

# public/audio/scenes/

# ├── scene-01-title.mp3

# ├── scene-02-problem.mp3

# ├── scene-03-solution.mp3

# └── manifest.json (durations for each file)

```

The manifest.json contains timing info:

```json

{

"scenes": [

{ "file": "scene-01-title.mp3", "duration": 4.2 },

{ "file": "scene-02-problem.mp3", "duration": 12.8 },

{ "file": "scene-03-solution.mp3", "duration": 15.3 }

"totalDuration": 32.3

}

```

Step 2: Use Audio in Remotion Composition

```tsx

// src/Composition.tsx

import { Audio, staticFile, Series, useVideoConfig } from 'remotion';

// Import scene components

import { TitleSlide } from './scenes/TitleSlide';

import { ProblemSlide } from './scenes/ProblemSlide';

import { SolutionSlide } from './scenes/SolutionSlide';

// Scene durations (from manifest.json, converted to frames at 30fps)

const SCENE_DURATIONS = {

title: Math.ceil(4.2 * 30), // 126 frames

problem: Math.ceil(12.8 * 30), // 384 frames

solution: Math.ceil(15.3 * 30), // 459 frames

};

export const MainComposition: React.FC = () => {

return (

{/ Scene sequence /}

{/ Audio track - plays continuously across all scenes /}

{/ Optional: Background music at lower volume /}

);

};

```

Step 3: Per-Scene Audio (Alternative)

For more control, add audio to each scene individually:

```tsx

// src/scenes/ProblemSlide.tsx

import { Audio, staticFile, useCurrentFrame } from 'remotion';

export const ProblemSlide: React.FC = () => {

const frame = useCurrentFrame();

return (

slide styles / }}>

The Problem

{/ Scene content /}

{/ Audio starts when this scene starts (frame 0 of this sequence) /}

);

};

```

Syncing Visuals to Voiceover

Calculate scene duration from audio, not the other way around:

```tsx

// src/config/timing.ts

import manifest from '../../public/audio/scenes/manifest.json';

const FPS = 30;

// Convert audio durations to frame counts

export const sceneDurations = manifest.scenes.reduce((acc, scene) => {

const name = scene.file.replace(/^scene-\d+-/, '').replace('.mp3', '');

acc[name] = Math.ceil(scene.duration * FPS);

return acc;

}, {} as Record);

// Usage in composition:

```

Audio Timing Patterns

```tsx

import { Audio, Sequence, interpolate, useCurrentFrame } from 'remotion';

// Fade in audio

export const FadeInAudio: React.FC<{ src: string; fadeFrames?: number }> = ({

src,

fadeFrames = 30

}) => {

const frame = useCurrentFrame();

const volume = interpolate(frame, [0, fadeFrames], [0, 1], {

extrapolateRight: 'clamp',

});

return

};

// Delayed audio start

export const DelayedAudio: React.FC<{ src: string; delayFrames: number }> = ({

src,

delayFrames

}) => (

);

// Usage:

```

Voiceover + Demo Video Sync

When a scene has both voiceover and demo video:

```tsx

import { Audio, OffthreadVideo, staticFile, useVideoConfig } from 'remotion';

export const DemoScene: React.FC = () => {

const { durationInFrames, fps } = useVideoConfig();

// Calculate playback rate to fit demo into voiceover duration

const demoDuration = 45; // seconds (original demo length)

const sceneDuration = durationInFrames / fps; // seconds (from voiceover)

const playbackRate = demoDuration / sceneDuration;

return (

src={staticFile('demos/feature-demo.mp4')}

playbackRate={playbackRate}

);

};

```

Error Handling

```tsx

import { Audio, staticFile, delayRender, continueRender } from 'remotion';

import { useEffect, useState } from 'react';

export const SafeAudio: React.FC<{ src: string }> = ({ src }) => {

const [handle] = useState(() => delayRender());

const [audioReady, setAudioReady] = useState(false);

useEffect(() => {

const audio = new window.Audio(src);

audio.oncanplaythrough = () => {

setAudioReady(true);

continueRender(handle);

};

audio.onerror = () => {

console.error(Failed to load audio: ${src});

continueRender(handle); // Continue without audio rather than hang

};

}, [src, handle]);

if (!audioReady) return null;

return

};

```

Toolkit Command: /generate-voiceover

The /generate-voiceover command handles the full workflow:

```

/generate-voiceover

Reads VOICEOVER-SCRIPT.md
Extracts narration for each scene
Generates audio via ElevenLabs API
Saves to public/audio/scenes/
Creates manifest.json with durations
Updates project.json with timing info

```

Popular Voices

George: JBFqnCBsd6RMkjVDRZzb (warm narrator)
Rachel: 21m00Tcm4TlvDq8ikWAM (clear female)
Adam: pNInz6obpgDQGcFmaJgB (professional male)

List all: client.voices.get_all()

For full API docs, see [reference.md](reference.md).

More from this repository5

🎯

ffmpeg🎯Skill

ffmpeg skill from digitalsamba/claude-code-video-toolkit

🎯

playwright-recording🎯Skill

playwright-recording skill from digitalsamba/claude-code-video-toolkit

🎯

remotion🎯Skill

Programmatically creates and renders videos using React, enabling frame-by-frame animations, video compositions, and rendering to multiple formats.

🎯

frontend-design🎯Skill

Crafts distinctive, production-grade frontend interfaces with exceptional design quality, avoiding generic AI aesthetics.

🎯

qwen-edit🎯Skill

qwen-edit skill from digitalsamba/claude-code-video-toolkit