ElevenLabs¶

ElevenLabs provides premium voice synthesis with natural, expressive voices.

Features¶

TTS: Industry-leading voice quality
STT: Real-time transcription (Scribe)
Voice Cloning: Create custom voices
Streaming: Ultra-low latency streaming
Languages: 29+ languages

Configuration¶

import (
    "github.com/plexusone/omnivoice"
    _ "github.com/plexusone/omnivoice/providers/elevenlabs"
)

// TTS Provider
tts, err := omnivoice.GetTTSProvider("elevenlabs",
    omnivoice.WithAPIKey(os.Getenv("ELEVENLABS_API_KEY")),
)

// STT Provider
stt, err := omnivoice.GetSTTProvider("elevenlabs",
    omnivoice.WithAPIKey(os.Getenv("ELEVENLABS_API_KEY")),
)

Text-to-Speech¶

Popular Voices¶

Voice ID	Name	Style
`pNInz6obpgDQGcFmaJgB`	Adam	Deep, narrative
`EXAVITQu4vr4xnSDxMaL`	Sarah	Soft, young
`onwK4e9ZLuTAKqWW03F9`	Daniel	British, authoritative
`XB0fDUnXU5powFXDhCwa`	Charlotte	Swedish, calm

Basic Usage¶

result, err := tts.Synthesize(ctx, "Hello, world!", omnivoice.SynthesisConfig{
    VoiceID: "pNInz6obpgDQGcFmaJgB",
})
if err != nil {
    log.Fatal(err)
}

os.WriteFile("output.mp3", result.Audio, 0600)

Models¶

Model	Latency	Quality	Use Case
`eleven_turbo_v2_5`	Lowest	Good	Real-time voice agents
`eleven_multilingual_v2`	Low	Excellent	Multi-language apps
`eleven_monolingual_v1`	Medium	Excellent	English-only

result, err := tts.Synthesize(ctx, text, omnivoice.SynthesisConfig{
    VoiceID: "pNInz6obpgDQGcFmaJgB",
    Extensions: map[string]any{
        "model_id": "eleven_turbo_v2_5",
    },
})

Voice Settings¶

result, err := tts.Synthesize(ctx, text, omnivoice.SynthesisConfig{
    VoiceID: "pNInz6obpgDQGcFmaJgB",
    Extensions: map[string]any{
        "stability":        0.5,  // 0-1, higher = more consistent
        "similarity_boost": 0.75, // 0-1, higher = closer to original voice
        "style":            0.3,  // 0-1, style exaggeration
        "use_speaker_boost": true,
    },
})

Output Formats¶

Format	Use Case
`mp3_44100_128`	General purpose
`mp3_44100_192`	Higher quality
`pcm_16000`	Real-time, low latency
`pcm_44100`	Real-time, high quality
`ulaw_8000`	Telephony (Twilio)

config := omnivoice.SynthesisConfig{
    VoiceID:      "pNInz6obpgDQGcFmaJgB",
    OutputFormat: "pcm_16000",  // For real-time streaming
}

Streaming¶

stream, err := tts.SynthesizeStream(ctx, text, omnivoice.SynthesisConfig{
    VoiceID: "pNInz6obpgDQGcFmaJgB",
    Extensions: map[string]any{
        "model_id":      "eleven_turbo_v2_5",
        "optimize_streaming_latency": 3,  // 0-4, higher = lower latency
    },
})
if err != nil {
    log.Fatal(err)
}

for chunk := range stream {
    if chunk.Error != nil {
        break
    }
    // Play immediately for low latency
    playAudio(chunk.Audio)
}

Speech-to-Text¶

Basic Transcription¶

result, err := stt.TranscribeFile(ctx, "audio.mp3", omnivoice.TranscriptionConfig{
    Language: "en",
})
if err != nil {
    log.Fatal(err)
}

fmt.Println(result.Text)

Streaming Transcription¶

stream, err := stt.TranscribeStream(ctx, omnivoice.TranscriptionConfig{
    Language: "en",
    Extensions: map[string]any{
        "interim_results": true,
    },
})
if err != nil {
    log.Fatal(err)
}

// Send audio
go func() {
    for audio := range audioSource {
        stream.Write(audio)
    }
    stream.Close()
}()

// Receive transcriptions
for result := range stream.Results() {
    if result.IsFinal {
        fmt.Printf("Final: %s\n", result.Text)
    } else {
        fmt.Printf("Interim: %s\r", result.Text)
    }
}

Speaker Diarization¶

result, err := stt.TranscribeFile(ctx, "meeting.mp3", omnivoice.TranscriptionConfig{
    EnableSpeakerDiarization: true,
    Extensions: map[string]any{
        "num_speakers": 2,
    },
})

for _, segment := range result.Segments {
    fmt.Printf("[Speaker %d] %s\n", segment.Speaker, segment.Text)
}

Voice Cloning¶

Create custom voices (requires ElevenLabs account):

// Use a cloned voice
result, err := tts.Synthesize(ctx, text, omnivoice.SynthesisConfig{
    VoiceID: "your-cloned-voice-id",
})

Latency Optimization¶

For voice agents requiring minimal latency:

tts, _ := omnivoice.GetTTSProvider("elevenlabs",
    omnivoice.WithAPIKey(apiKey),
)

stream, _ := tts.SynthesizeStream(ctx, response, omnivoice.SynthesisConfig{
    VoiceID:      "pNInz6obpgDQGcFmaJgB",
    OutputFormat: "pcm_16000",
    Extensions: map[string]any{
        "model_id":                    "eleven_turbo_v2_5",
        "optimize_streaming_latency": 4,  // Maximum optimization
    },
})

Error Handling¶

result, err := tts.Synthesize(ctx, text, config)
if err != nil {
    switch {
    case strings.Contains(err.Error(), "quota_exceeded"):
        log.Println("Monthly quota exceeded")
    case strings.Contains(err.Error(), "voice_not_found"):
        log.Println("Invalid voice ID")
    case strings.Contains(err.Error(), "invalid_api_key"):
        log.Println("Check ELEVENLABS_API_KEY")
    default:
        log.Printf("Error: %v", err)
    }
}

Best Practices¶

Use Turbo for real-time - eleven_turbo_v2_5 has lowest latency
Cache common phrases - Pre-generate greetings, confirmations
Use PCM for streaming - No encoding overhead
Set optimize_streaming_latency - Higher values reduce time-to-first-byte
Monitor quota - ElevenLabs has character quotas per plan

Pricing¶

Plan	Characters/Month	Price
Free	10,000	$0
Starter	30,000	$5
Creator	100,000	$22
Pro	500,000	$99

Check ElevenLabs Pricing for current rates.

Next Steps¶

OpenAI - Alternative TTS/STT
Deepgram - Real-time STT
Voice Agents - Build conversational agents