Text-to-Speech (TTS)¶
Convert text to natural-sounding speech using multiple providers.
Quick Start¶
provider, _ := omnivoice.GetTTSProvider("elevenlabs",
omnivoice.WithAPIKey(apiKey))
result, _ := provider.Synthesize(ctx, "Hello, world!", omnivoice.SynthesisConfig{
VoiceID: "pNInz6obpgDQGcFmaJgB",
})
// result.Audio contains the audio bytes
Available Providers¶
| Provider | Registry Name | Quality | Latency | Best For |
|---|---|---|---|---|
| ElevenLabs | "elevenlabs" |
Excellent | Medium | Natural voices, cloning |
| OpenAI | "openai" |
Very Good | Low | General purpose |
| Deepgram | "deepgram" |
Good | Very Low | Real-time, low latency |
| Twilio | "twilio" |
Good | Low | Phone calls (TwiML) |
Basic Synthesis¶
package main
import (
"context"
"os"
"github.com/plexusone/omnivoice"
_ "github.com/plexusone/omnivoice/providers/all"
)
func main() {
ctx := context.Background()
provider, _ := omnivoice.GetTTSProvider("elevenlabs",
omnivoice.WithAPIKey(os.Getenv("ELEVENLABS_API_KEY")))
result, err := provider.Synthesize(ctx, "Welcome to OmniVoice!", omnivoice.SynthesisConfig{
VoiceID: "pNInz6obpgDQGcFmaJgB", // Adam
OutputFormat: "mp3_44100_128",
SampleRate: 44100,
})
if err != nil {
panic(err)
}
os.WriteFile("output.mp3", result.Audio, 0644)
}
Streaming TTS¶
For real-time applications, use streaming synthesis:
stream, err := provider.SynthesizeStream(ctx, "This is a longer text that will be streamed...", omnivoice.SynthesisConfig{
VoiceID: "pNInz6obpgDQGcFmaJgB",
})
if err != nil {
panic(err)
}
// Read audio chunks as they arrive
for chunk := range stream {
// Process chunk.Audio in real-time
playAudio(chunk.Audio)
}
Configuration Options¶
config := omnivoice.SynthesisConfig{
// Voice selection
VoiceID: "voice-id", // Provider-specific voice ID
// Audio format
OutputFormat: "mp3_44100_128", // Format string (provider-specific)
SampleRate: 44100, // Sample rate in Hz
// Voice settings (ElevenLabs)
Stability: 0.5, // 0.0-1.0, lower = more expressive
SimilarityBoost: 0.75, // 0.0-1.0, higher = closer to original
// Provider-specific extensions
Extensions: map[string]any{
"model_id": "eleven_multilingual_v2",
},
}
Provider-Specific Examples¶
ElevenLabs¶
provider, _ := omnivoice.GetTTSProvider("elevenlabs",
omnivoice.WithAPIKey(apiKey))
result, _ := provider.Synthesize(ctx, text, omnivoice.SynthesisConfig{
VoiceID: "pNInz6obpgDQGcFmaJgB",
OutputFormat: "mp3_44100_128",
Extensions: map[string]any{
"model_id": "eleven_multilingual_v2",
},
})
OpenAI¶
provider, _ := omnivoice.GetTTSProvider("openai",
omnivoice.WithAPIKey(apiKey))
result, _ := provider.Synthesize(ctx, text, omnivoice.SynthesisConfig{
VoiceID: "alloy", // alloy, echo, fable, onyx, nova, shimmer
Extensions: map[string]any{
"model": "tts-1-hd", // tts-1 or tts-1-hd
"speed": 1.0, // 0.25 to 4.0
},
})
Deepgram¶
provider, _ := omnivoice.GetTTSProvider("deepgram",
omnivoice.WithAPIKey(apiKey))
result, _ := provider.Synthesize(ctx, text, omnivoice.SynthesisConfig{
VoiceID: "aura-asteria-en", // Aura voices
Extensions: map[string]any{
"encoding": "mp3",
},
})
SSML Support¶
Some providers support SSML for fine-grained control:
ssml := `<speak>
Hello! <break time="500ms"/>
This is <emphasis level="strong">important</emphasis>.
<prosody rate="slow">Speaking slowly now.</prosody>
</speak>`
result, _ := provider.Synthesize(ctx, ssml, omnivoice.SynthesisConfig{
VoiceID: voiceID,
Extensions: map[string]any{
"use_ssml": true,
},
})
Voice Selection¶
ElevenLabs Voices¶
| Voice ID | Name | Description |
|---|---|---|
pNInz6obpgDQGcFmaJgB |
Adam | Deep, narrative |
EXAVITQu4vr4xnSDxMaL |
Sarah | Soft, conversational |
21m00Tcm4TlvDq8ikWAM |
Rachel | Calm, professional |
OpenAI Voices¶
| Voice ID | Description |
|---|---|
alloy |
Neutral, balanced |
echo |
Warm, conversational |
fable |
Expressive, British |
onyx |
Deep, authoritative |
nova |
Friendly, upbeat |
shimmer |
Clear, professional |
Error Handling¶
result, err := provider.Synthesize(ctx, text, config)
if err != nil {
switch {
case errors.Is(err, context.DeadlineExceeded):
log.Println("Request timed out")
case strings.Contains(err.Error(), "invalid_api_key"):
log.Println("Invalid API key")
case strings.Contains(err.Error(), "quota_exceeded"):
log.Println("Rate limit or quota exceeded")
default:
log.Printf("TTS error: %v", err)
}
return
}
Best Practices¶
- Cache audio - Store generated audio to avoid repeated API calls
- Use streaming - For long texts or real-time applications
- Choose appropriate quality - Lower quality = faster, smaller files
- Handle errors gracefully - Implement retries for transient failures
- Consider latency - Deepgram for real-time, ElevenLabs for quality
Next Steps¶
- Streaming Guide - Real-time TTS streaming
- Voice Agents - Combine TTS with STT and LLMs
- Provider Details - Provider-specific features