Voice Integration¶
OmniAgent supports voice notes via OmniVoice, providing speech-to-text (STT) and text-to-speech (TTS) capabilities.
Overview¶
When voice processing is enabled:
- Incoming voice messages are transcribed to text
- The text is processed by the AI agent
- Responses can be synthesized back to speech
Supported Providers¶
| Provider | STT | TTS | Notes |
|---|---|---|---|
| Deepgram | ✅ | ✅ | Nova-2 for STT, Aura voices for TTS |
| OpenAI | ✅ | ✅ | Whisper for STT, TTS-1 for TTS |
| ElevenLabs | ❌ | ✅ | High-quality voice synthesis |
Configuration¶
Environment Variables¶
| Variable | Description |
|---|---|
DEEPGRAM_API_KEY |
Deepgram API key |
OPENAI_API_KEY |
OpenAI API key (for Whisper/TTS) |
ELEVENLABS_API_KEY |
ElevenLabs API key |
OMNIAGENT_VOICE_ENABLED |
Enable voice processing |
OMNIAGENT_VOICE_RESPONSE_MODE |
Response mode: auto, always, never |
Config File¶
voice:
enabled: true
response_mode: auto
stt:
provider: deepgram
model: nova-2
tts:
provider: deepgram
model: aura-asteria-en
voice_id: aura-asteria-en
Response Modes¶
| Mode | Behavior |
|---|---|
auto |
Reply with voice only to voice messages |
always |
Always reply with voice |
never |
Never reply with voice (text only) |
Provider Setup¶
Deepgram¶
- Sign up at deepgram.com
- Create an API key
- Set
DEEPGRAM_API_KEYenvironment variable
OpenAI¶
Uses your existing OpenAI API key:
ElevenLabs (TTS only)¶
- Sign up at elevenlabs.io
- Create an API key
- Set
ELEVENLABS_API_KEYenvironment variable
Architecture¶
OmniVoice uses a provider registry pattern:
import (
"github.com/plexusone/omnivoice"
_ "github.com/plexusone/omnivoice/providers/all" // Register all providers
)
// Get providers by name
stt, _ := omnivoice.GetSTTProvider("deepgram", omnivoice.WithAPIKey(key))
tts, _ := omnivoice.GetTTSProvider("elevenlabs", omnivoice.WithAPIKey(key))
Troubleshooting¶
Voice Not Working¶
- Verify API keys are set correctly
- Check that voice is enabled in config
- Ensure the provider supports your chosen model
Poor Transcription Quality¶
- Use Deepgram Nova-2 or OpenAI Whisper for best results
- Ensure audio quality is reasonable
- Check language settings match the spoken language
TTS Sounds Robotic¶
- Try different voice IDs
- ElevenLabs offers the most natural-sounding voices
- Adjust model settings if available