Voice Integration¶

OmniAgent supports voice notes via OmniVoice, providing speech-to-text (STT) and text-to-speech (TTS) capabilities.

Overview¶

When voice processing is enabled:

Incoming voice messages are transcribed to text
The text is processed by the AI agent
Responses can be synthesized back to speech

Supported Providers¶

Provider	STT	TTS	Notes
Deepgram	✅	✅	Nova-2 for STT, Aura voices for TTS
OpenAI	✅	✅	Whisper for STT, TTS-1 for TTS
ElevenLabs	❌	✅	High-quality voice synthesis

Configuration¶

Environment Variables¶

Variable	Description
`DEEPGRAM_API_KEY`	Deepgram API key
`OPENAI_API_KEY`	OpenAI API key (for Whisper/TTS)
`ELEVENLABS_API_KEY`	ElevenLabs API key
`OMNIAGENT_VOICE_ENABLED`	Enable voice processing
`OMNIAGENT_VOICE_RESPONSE_MODE`	Response mode: `auto`, `always`, `never`

Config File¶

voice:
  enabled: true
  response_mode: auto

  stt:
    provider: deepgram
    model: nova-2

  tts:
    provider: deepgram
    model: aura-asteria-en
    voice_id: aura-asteria-en

Response Modes¶

Mode	Behavior
`auto`	Reply with voice only to voice messages
`always`	Always reply with voice
`never`	Never reply with voice (text only)

Provider Setup¶

Deepgram¶

Sign up at deepgram.com
Create an API key
Set DEEPGRAM_API_KEY environment variable

voice:
  stt:
    provider: deepgram
    model: nova-2
  tts:
    provider: deepgram
    model: aura-asteria-en

OpenAI¶

Uses your existing OpenAI API key:

voice:
  stt:
    provider: openai
    model: whisper-1
  tts:
    provider: openai
    model: tts-1
    voice_id: alloy

ElevenLabs (TTS only)¶

Sign up at elevenlabs.io
Create an API key
Set ELEVENLABS_API_KEY environment variable

voice:
  tts:
    provider: elevenlabs
    voice_id: your-voice-id

Architecture¶

OmniVoice uses a provider registry pattern:

import (
    "github.com/plexusone/omnivoice"
    _ "github.com/plexusone/omnivoice/providers/all" // Register all providers
)

// Get providers by name
stt, _ := omnivoice.GetSTTProvider("deepgram", omnivoice.WithAPIKey(key))
tts, _ := omnivoice.GetTTSProvider("elevenlabs", omnivoice.WithAPIKey(key))

Troubleshooting¶

Voice Not Working¶

Verify API keys are set correctly
Check that voice is enabled in config
Ensure the provider supports your chosen model

Poor Transcription Quality¶

Use Deepgram Nova-2 or OpenAI Whisper for best results
Ensure audio quality is reasonable
Check language settings match the spoken language

TTS Sounds Robotic¶

Try different voice IDs
ElevenLabs offers the most natural-sounding voices
Adjust model settings if available