Observability¶

The observability package provides instrumentation interfaces for monitoring and debugging voice operations. It enables tracking of call lifecycle events, TTS synthesis metrics, and STT transcription performance.

Overview¶

OmniVoice observability consists of two main components:

Voice Events - Call lifecycle events (initiated, answered, ended, etc.)
Operation Hooks - TTS and STT instrumentation for latency, throughput, and error tracking

┌─────────────────────────────────────────────────────────────────┐
│                        Voice Application                         │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────────┐   │
│  │     TTS      │    │     STT      │    │   CallSystem     │   │
│  │   + Hook     │    │   + Hook     │    │   + Observer     │   │
│  └──────┬───────┘    └──────┬───────┘    └────────┬─────────┘   │
│         │                   │                     │              │
│         ▼                   ▼                     ▼              │
│  ┌─────────────────────────────────────────────────────────────┐ │
│  │                    Observability Layer                      │ │
│  ├──────────────────┬──────────────────┬───────────────────────┤ │
│  │    TTSHook       │    STTHook       │    VoiceObserver      │ │
│  │  - Latency       │  - Latency       │  - Call Events        │ │
│  │  - Audio Size    │  - Confidence    │  - Media Events       │ │
│  │  - Errors        │  - Errors        │  - DTMF Events        │ │
│  └──────────────────┴──────────────────┴───────────────────────┘ │
│                              │                                   │
│                              ▼                                   │
│  ┌─────────────────────────────────────────────────────────────┐ │
│  │                      Backends                               │ │
│  │   Prometheus  │  OpenTelemetry  │  Logging  │  Custom       │ │
│  └─────────────────────────────────────────────────────────────┘ │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Voice Events¶

Event Types¶

The package defines event types for the call lifecycle:

Event	Description
`call.initiated`	Call started (outbound) or received (inbound)
`call.ringing`	Outbound call is ringing
`call.answered`	Call was answered
`call.ended`	Call ended normally
`call.failed`	Call failed
`call.busy`	Line was busy
`call.no_answer`	No answer
`media.connected`	Media streaming connected
`media.disconnected`	Media streaming disconnected
`media.error`	Media streaming error
`dtmf.received`	DTMF tones received

VoiceEvent Structure¶

type VoiceEvent struct {
    Type      EventType         // Event type (e.g., "call.answered")
    Timestamp time.Time         // When the event occurred
    CallID    string            // Unique call identifier
    Provider  string            // Provider name (e.g., "twilio")
    Direction string            // "inbound" or "outbound"
    From      string            // Caller ID
    To        string            // Called number
    Duration  time.Duration     // Call duration (for ended events)
    Error     error             // Error details (for failed events)
    Metadata  map[string]any    // Provider-specific data
}

VoiceObserver Interface¶

Implement VoiceObserver to receive voice events:

type VoiceObserver interface {
    OnEvent(ctx context.Context, event VoiceEvent)
}

Basic Usage¶

import "github.com/plexusone/omnivoice-core/observability"

// Create an observer using the function adapter
observer := observability.VoiceObserverFunc(func(ctx context.Context, event observability.VoiceEvent) {
    log.Printf("[%s] %s: %s -> %s",
        event.Type, event.CallID, event.From, event.To)
})

// Use with CallSystem
call, err := provider.MakeCall(ctx, "+15559876543",
    callsystem.WithObserver(observer),
)

Emitting Events¶

Providers use EmitEvent to send events with functional options:

observability.EmitEvent(ctx, observer, observability.EventCallAnswered, callID, "twilio",
    observability.WithDirection("outbound"),
    observability.WithFrom("+15551234567"),
    observability.WithTo("+15559876543"),
)

Multi-Observer¶

Fan out events to multiple observers:

multi := observability.NewMultiObserver(
    metricsObserver,
    loggingObserver,
    analyticsObserver,
)

call, err := provider.MakeCall(ctx, to, callsystem.WithObserver(multi))

TTS Hooks¶

The TTSHook interface instruments text-to-speech operations:

type TTSHook interface {
    // Called before synthesis
    BeforeSynthesize(ctx context.Context, info TTSCallInfo, req TTSRequest) context.Context

    // Called after synthesis completes
    AfterSynthesize(ctx context.Context, info TTSCallInfo, req TTSRequest, resp *TTSResponse, err error)

    // Wraps streaming audio for byte counting
    WrapStream(ctx context.Context, info TTSCallInfo, req TTSRequest, stream <-chan []byte) <-chan []byte
}

TTSCallInfo¶

type TTSCallInfo struct {
    CallID    string    // Unique identifier for correlation
    Provider  string    // Provider name (e.g., "elevenlabs")
    StartTime time.Time // Operation start time
    VoiceID   string    // Voice being used
    Model     string    // TTS model
}

TTSRequest / TTSResponse¶

type TTSRequest struct {
    Text         string // Text to synthesize
    TextLength   int    // Character count
    OutputFormat string // Audio format (e.g., "mp3")
    SampleRate   int    // Audio sample rate
}

type TTSResponse struct {
    AudioSize int64         // Generated audio size in bytes
    Duration  time.Duration // Audio duration
    Latency   time.Duration // Time to first byte (streaming)
}

Using TTS Hooks¶

// Set hook on client (applies to all operations)
ttsClient.SetHook(myTTSHook)

// Or per-request via config
result, err := provider.Synthesize(ctx, text, tts.SynthesisConfig{
    VoiceID: "voice-id",
    Hook:    myTTSHook,
})

Example: Metrics Hook¶

type MetricsTTSHook struct {
    synthesizeLatency prometheus.Histogram
    audioBytes        prometheus.Counter
    errors            prometheus.Counter
}

func (h *MetricsTTSHook) BeforeSynthesize(ctx context.Context, info observability.TTSCallInfo, req observability.TTSRequest) context.Context {
    return ctx // Could add trace span to context
}

func (h *MetricsTTSHook) AfterSynthesize(ctx context.Context, info observability.TTSCallInfo, req observability.TTSRequest, resp *observability.TTSResponse, err error) {
    if err != nil {
        h.errors.Inc()
        return
    }
    h.synthesizeLatency.Observe(resp.Latency.Seconds())
    h.audioBytes.Add(float64(resp.AudioSize))
}

func (h *MetricsTTSHook) WrapStream(ctx context.Context, info observability.TTSCallInfo, req observability.TTSRequest, stream <-chan []byte) <-chan []byte {
    return stream // Could wrap to count bytes
}

STT Hooks¶

The STTHook interface instruments speech-to-text operations:

type STTHook interface {
    // Called before transcription
    BeforeTranscribe(ctx context.Context, info STTCallInfo, req STTRequest) context.Context

    // Called after transcription completes
    AfterTranscribe(ctx context.Context, info STTCallInfo, req STTRequest, resp *STTResponse, err error)

    // Wraps audio writer for byte tracking
    WrapStreamWriter(ctx context.Context, info STTCallInfo, req STTRequest, writer io.WriteCloser) io.WriteCloser

    // Called for each streaming result
    OnStreamResult(ctx context.Context, info STTCallInfo, resp STTResponse)
}

STTCallInfo¶

type STTCallInfo struct {
    CallID    string    // Unique identifier for correlation
    Provider  string    // Provider name (e.g., "deepgram")
    StartTime time.Time // Operation start time
    Model     string    // STT model
    Language  string    // Expected language
}

STTRequest / STTResponse¶

type STTRequest struct {
    AudioSize   int64  // Audio size in bytes
    Encoding    string // Audio encoding (e.g., "pcm")
    SampleRate  int    // Sample rate
    Channels    int    // Number of channels
    IsStreaming bool   // Streaming transcription
}

type STTResponse struct {
    Transcript       string        // Transcribed text
    TranscriptLength int           // Character count
    Confidence       float64       // Confidence score (0-1)
    AudioDuration    time.Duration // Audio processed
    Latency          time.Duration // Processing latency
    IsFinal          bool          // Final result (streaming)
}

Using STT Hooks¶

// Set hook on client
sttClient.SetHook(mySTTHook)

// Or per-request
result, err := provider.Transcribe(ctx, audio, stt.TranscriptionConfig{
    Model: "nova-2",
    Hook:  mySTTHook,
})

NoOp Implementations¶

For testing or optional observability, use the provided no-op implementations:

// These do nothing but satisfy the interfaces
var _ observability.TTSHook = observability.NoOpTTSHook{}
var _ observability.STTHook = observability.NoOpSTTHook{}

Integration Patterns¶

OpenTelemetry¶

type OTelTTSHook struct {
    tracer trace.Tracer
}

func (h *OTelTTSHook) BeforeSynthesize(ctx context.Context, info observability.TTSCallInfo, req observability.TTSRequest) context.Context {
    ctx, span := h.tracer.Start(ctx, "tts.synthesize",
        trace.WithAttributes(
            attribute.String("tts.provider", info.Provider),
            attribute.String("tts.voice", info.VoiceID),
            attribute.Int("tts.text_length", req.TextLength),
        ),
    )
    return ctx
}

func (h *OTelTTSHook) AfterSynthesize(ctx context.Context, info observability.TTSCallInfo, req observability.TTSRequest, resp *observability.TTSResponse, err error) {
    span := trace.SpanFromContext(ctx)
    if err != nil {
        span.RecordError(err)
        span.SetStatus(codes.Error, err.Error())
    } else {
        span.SetAttributes(
            attribute.Int64("tts.audio_bytes", resp.AudioSize),
            attribute.Float64("tts.latency_ms", float64(resp.Latency.Milliseconds())),
        )
    }
    span.End()
}

Logging¶

type LoggingObserver struct {
    logger *slog.Logger
}

func (o *LoggingObserver) OnEvent(ctx context.Context, event observability.VoiceEvent) {
    o.logger.Info("voice event",
        "type", event.Type,
        "call_id", event.CallID,
        "provider", event.Provider,
        "direction", event.Direction,
        "from", event.From,
        "to", event.To,
    )
}

Best Practices¶

Keep hooks lightweight - Observers are called synchronously; avoid blocking operations
Handle errors internally - Hooks should not panic or return errors
Use context for correlation - Pass trace IDs through context in BeforeSynthesize/BeforeTranscribe
Aggregate metrics - Use counters and histograms rather than logging every event
Filter events - Not all events need processing; filter by type as needed

API Reference¶

See the GoDoc for complete API documentation.