Structured Evaluation Integration¶
OmniObserve integrates with structured-evaluation (sevaluation) to connect evaluation workflows with observability traces.
Installation¶
Overview¶
The sevaluation integration bridges:
- Evaluation workflows - Running structured evaluation suites
- Trace recording - Capturing evaluation results in observability providers
- Feedback scores - Adding evaluation scores to traces and spans
Basic Usage¶
import (
"github.com/plexusone/omniobserve/integrations/sevaluation"
"github.com/plexusone/omniobserve/llmops"
)
// Initialize provider
provider, _ := llmops.Open("opik",
llmops.WithAPIKey("..."),
llmops.WithProjectName("evaluations"),
)
// Create evaluation integration
eval := sevaluation.New(provider)
// Run evaluations and record results
results, err := eval.Run(ctx, suite)
Use Cases¶
LLM Output Evaluation¶
Evaluate LLM outputs and record results to your observability provider:
ctx, trace, _ := provider.StartTrace(ctx, "evaluation-run")
defer trace.End()
// Run evaluation suite
results, _ := eval.Evaluate(ctx, sevaluation.EvalConfig{
Suite: mySuite,
Input: llmOutput,
})
// Scores are automatically added to the trace
RAG Pipeline Evaluation¶
Evaluate retrieval and generation quality:
// Retrieval relevance
results, _ := eval.EvaluateRetrieval(ctx, sevaluation.RetrievalConfig{
Query: query,
Documents: retrievedDocs,
})
// Generation quality
results, _ := eval.EvaluateGeneration(ctx, sevaluation.GenerationConfig{
Input: query,
Context: retrievedDocs,
Output: generatedResponse,
Expected: expectedAnswer,
})
Recording to Traces¶
Evaluation results are automatically recorded as feedback scores:
// Results include scores that can be added to spans
for _, result := range results {
span.AddFeedbackScore(ctx, result.MetricName, result.Score,
llmops.WithFeedbackReason(result.Reason),
)
}
Provider Support¶
| Provider | Evaluation Recording |
|---|---|
| Opik | :white_check_mark: |
| Langfuse | :white_check_mark: |
| Phoenix | :white_check_mark: |
| slog | :x: |