SamplingSHAP¶
SamplingSHAP uses simple Monte Carlo sampling to approximate SHAP values. It's faster than PermutationSHAP but doesn't guarantee local accuracy.
When to Use¶
- Prototyping: Quick exploration of feature importance
- Speed over accuracy: When approximate values are acceptable
- Large models: When PermutationSHAP is too slow
Basic Usage¶
package main
import (
"context"
"fmt"
"github.com/plexusone/shap-go/explainer"
"github.com/plexusone/shap-go/explainer/sampling"
"github.com/plexusone/shap-go/model"
)
func main() {
// Define your model
predict := func(ctx context.Context, input []float64) (float64, error) {
return input[0]*input[0] + 2*input[1] + input[0]*input[2], nil
}
m := model.NewFuncModel(predict, 3)
// Background data
background := [][]float64{
{0.0, 0.0, 0.0},
{1.0, 0.0, 0.0},
{0.0, 1.0, 0.0},
{0.5, 0.5, 0.5},
}
// Create sampling explainer
exp, _ := sampling.New(m, background,
explainer.WithNumSamples(100),
explainer.WithSeed(42),
explainer.WithFeatureNames([]string{"x0", "x1", "x2"}),
)
// Explain
ctx := context.Background()
explanation, _ := exp.Explain(ctx, []float64{1.0, 2.0, 0.5})
fmt.Printf("Prediction: %.4f\n", explanation.Prediction)
fmt.Printf("Base Value: %.4f\n", explanation.BaseValue)
for _, name := range explanation.FeatureNames {
fmt.Printf("SHAP[%s]: %.4f\n", name, explanation.Values[name])
}
// Check accuracy (may not be exact)
result := explanation.Verify(0.5) // Use larger tolerance
fmt.Printf("\nWithin tolerance: %v (diff=%.4f)\n", result.Valid, result.Difference)
}
How It Works¶
SamplingSHAP uses simple Monte Carlo estimation:
- Generate random permutations of features
- For each permutation, compute marginal contribution of each feature
- Average contributions across all samples
For each sample:
permutation = random_shuffle([0, 1, 2, ..., n-1])
for each feature i in permutation:
contribution[i] = f(S ∪ {i}) - f(S)
where S = features before i in permutation
Sample Count Trade-offs¶
| Samples | Speed | Accuracy | Use Case |
|---|---|---|---|
| 10 | Very fast | Low | Quick check |
| 50 | Fast | Medium | Exploration |
| 100 | Medium | Good | General use |
| 500 | Slow | High | Important decisions |
Effect on Accuracy¶
ctx := context.Background()
instance := []float64{1.0, 2.0, 0.5}
for _, samples := range []int{10, 50, 100, 500} {
exp, _ := sampling.New(m, background,
explainer.WithNumSamples(samples),
explainer.WithSeed(42),
)
explanation, _ := exp.Explain(ctx, instance)
result := explanation.Verify(1.0)
fmt.Printf("Samples=%3d: diff=%.4f\n", samples, result.Difference)
}
Output:
Configuration¶
exp, _ := sampling.New(m, background,
// Number of Monte Carlo samples
explainer.WithNumSamples(100),
// Seed for reproducibility
explainer.WithSeed(42),
// Feature names
explainer.WithFeatureNames([]string{"age", "income", "score"}),
// Model identifier
explainer.WithModelID("quick-model"),
)
Comparison with PermutationSHAP¶
| Property | SamplingSHAP | PermutationSHAP |
|---|---|---|
| Local accuracy | Not guaranteed | Guaranteed |
| Variance | Higher | Lower |
| Speed | Faster | Slower |
| Algorithm | Simple MC | Antithetic sampling |
When to Choose SamplingSHAP¶
- Exploratory analysis
- Model is very slow to evaluate
- Approximate values are acceptable
- You'll validate important predictions separately
When to Choose PermutationSHAP¶
- Production deployments
- Regulatory/compliance requirements
- Need guaranteed accuracy
- Model is fast enough
Improving Accuracy¶
More Samples¶
The most direct way to improve accuracy:
Better Background Data¶
Representative background data improves both accuracy and reliability:
Multiple Runs¶
For critical decisions, run multiple times and check consistency:
results := make([]float64, 5)
for i := 0; i < 5; i++ {
exp, _ := sampling.New(m, background,
explainer.WithNumSamples(100),
explainer.WithSeed(int64(i)), // Different seed each time
)
explanation, _ := exp.Explain(ctx, instance)
results[i] = explanation.Values["important_feature"]
}
// Check if results are consistent
Use Cases¶
Quick Feature Importance Check¶
// Fast exploration - which features matter?
exp, _ := sampling.New(m, background,
explainer.WithNumSamples(50),
)
explanation, _ := exp.Explain(ctx, instance)
fmt.Println("Top features (approximate):")
for _, f := range explanation.TopFeatures(5) {
fmt.Printf(" %s: %.4f\n", f.Name, f.SHAPValue)
}
Batch Screening¶
// Quickly identify interesting instances for detailed analysis
for _, instance := range candidates {
explanation, _ := exp.Explain(ctx, instance)
// Flag instances where key feature has high impact
if abs(explanation.Values["risk_factor"]) > 0.5 {
flagForReview(instance)
}
}
Development/Testing¶
// Fast iteration during development
func TestModelExplanations(t *testing.T) {
exp, _ := sampling.New(m, background,
explainer.WithNumSamples(20), // Fast for tests
explainer.WithSeed(42), // Reproducible
)
explanation, _ := exp.Explain(ctx, testInstance)
// Check approximate behavior
if explanation.Values["x0"] < 0 {
t.Error("Expected positive contribution from x0")
}
}
Limitations¶
- No local accuracy guarantee: SHAP values may not sum exactly to
prediction - base_value - Higher variance: Results vary more between runs
- Less suitable for compliance: Use PermutationSHAP for auditable explanations
Next Steps¶
- PermutationSHAP - When you need guaranteed accuracy
- TreeSHAP - For tree models (exact and fast)
- Visualization - Create charts