Speech-to-Text (STT)¶
Convert audio to text using multiple providers with support for batch and streaming transcription.
Quick Start¶
provider, _ := omnivoice.GetSTTProvider("deepgram",
omnivoice.WithAPIKey(apiKey))
result, _ := provider.TranscribeFile(ctx, "audio.mp3", omnivoice.TranscriptionConfig{
Language: "en",
})
fmt.Println(result.Text)
Available Providers¶
| Provider | Registry Name | Accuracy | Latency | Best For |
|---|---|---|---|---|
| Deepgram | "deepgram" |
Excellent | Very Low | Real-time, high volume |
| OpenAI | "openai" |
Excellent | Medium | General purpose, multilingual |
| ElevenLabs | "elevenlabs" |
Very Good | Medium | Integration with TTS |
| Twilio | "twilio" |
Good | Low | Phone calls |
Transcription Methods¶
TranscribeFile¶
Transcribe a local audio file:
result, err := provider.TranscribeFile(ctx, "recording.mp3", omnivoice.TranscriptionConfig{
Language: "en",
EnableWordTimestamps: true,
})
TranscribeURL¶
Transcribe audio from a URL:
result, err := provider.TranscribeURL(ctx, "https://example.com/audio.mp3", omnivoice.TranscriptionConfig{
Language: "en",
})
Transcribe¶
Transcribe from an io.Reader:
file, _ := os.Open("audio.mp3")
defer file.Close()
result, err := provider.Transcribe(ctx, file, omnivoice.TranscriptionConfig{
Language: "en",
})
TranscribeStream¶
Real-time streaming transcription:
stream, err := provider.TranscribeStream(ctx, omnivoice.TranscriptionConfig{
Language: "en",
})
if err != nil {
panic(err)
}
// Send audio chunks
go func() {
for chunk := range audioSource {
stream.Write(chunk)
}
stream.Close()
}()
// Receive transcription results
for result := range stream.Results() {
if result.IsFinal {
fmt.Printf("Final: %s\n", result.Text)
} else {
fmt.Printf("Interim: %s\n", result.Text)
}
}
Configuration Options¶
config := omnivoice.TranscriptionConfig{
// Language
Language: "en-US", // BCP-47 language code
// Features
EnableWordTimestamps: true, // Word-level timing
EnableSpeakerDiarization: true, // Speaker identification
// Model selection (provider-specific)
Model: "nova-2",
// Provider-specific extensions
Extensions: map[string]any{
"smart_format": true,
"punctuate": true,
},
}
Provider-Specific Examples¶
Deepgram¶
provider, _ := omnivoice.GetSTTProvider("deepgram",
omnivoice.WithAPIKey(apiKey))
result, _ := provider.TranscribeFile(ctx, "audio.mp3", omnivoice.TranscriptionConfig{
Language: "en",
Model: "nova-2",
EnableWordTimestamps: true,
Extensions: map[string]any{
"smart_format": true,
"punctuate": true,
"diarize": true,
"utterances": true,
},
})
OpenAI Whisper¶
provider, _ := omnivoice.GetSTTProvider("openai",
omnivoice.WithAPIKey(apiKey))
result, _ := provider.TranscribeFile(ctx, "audio.mp3", omnivoice.TranscriptionConfig{
Language: "en",
Extensions: map[string]any{
"model": "whisper-1",
"response_format": "verbose_json",
"temperature": 0,
},
})
ElevenLabs Scribe¶
provider, _ := omnivoice.GetSTTProvider("elevenlabs",
omnivoice.WithAPIKey(apiKey))
result, _ := provider.TranscribeFile(ctx, "audio.mp3", omnivoice.TranscriptionConfig{
Language: "en",
EnableWordTimestamps: true,
})
Working with Results¶
Basic Text¶
Word Timestamps¶
for _, word := range result.Words {
fmt.Printf("[%.2fs - %.2fs] %s\n",
word.Start.Seconds(),
word.End.Seconds(),
word.Text)
}
Speaker Diarization¶
for _, segment := range result.Segments {
fmt.Printf("Speaker %d: %s\n", segment.Speaker, segment.Text)
}
Confidence Scores¶
fmt.Printf("Confidence: %.2f%%\n", result.Confidence * 100)
for _, word := range result.Words {
if word.Confidence < 0.8 {
fmt.Printf("Low confidence word: %s (%.2f)\n", word.Text, word.Confidence)
}
}
Language Codes¶
OmniVoice accepts BCP-47 language codes:
| Code | Language |
|---|---|
en |
English |
en-US |
English (US) |
en-GB |
English (UK) |
es |
Spanish |
fr |
French |
de |
German |
ja |
Japanese |
zh |
Chinese |
Most providers support automatic language detection when no code is specified.
Error Handling¶
result, err := provider.TranscribeFile(ctx, path, config)
if err != nil {
switch {
case errors.Is(err, context.DeadlineExceeded):
log.Println("Request timed out")
case os.IsNotExist(err):
log.Println("Audio file not found")
case strings.Contains(err.Error(), "unsupported_format"):
log.Println("Audio format not supported")
default:
log.Printf("STT error: %v", err)
}
return
}
Audio Format Support¶
| Format | Extension | Providers |
|---|---|---|
| MP3 | .mp3 |
All |
| WAV | .wav |
All |
| FLAC | .flac |
Deepgram, OpenAI |
| OGG | .ogg |
Deepgram, OpenAI |
| WebM | .webm |
Deepgram, OpenAI |
| M4A | .m4a |
Deepgram, OpenAI |
Best Practices¶
- Use appropriate models - Nova-2 for accuracy, Base for speed
- Enable word timestamps - Essential for subtitles and alignment
- Handle streaming errors - Reconnect on connection drops
- Choose the right provider - Deepgram for real-time, OpenAI for accuracy
- Preprocess audio - Normalize volume, remove silence
Next Steps¶
- Streaming Guide - Real-time transcription
- Subtitles Guide - Generate SRT/VTT captions
- Voice Agents - Build conversational agents