Background Data Selection Guide¶
Background data (also called reference data or baseline data) is fundamental to SHAP explanations. This guide covers best practices for selecting background data that produces meaningful and stable explanations.
What is Background Data?¶
SHAP values explain a prediction by comparing it to a baseline expectation. The background dataset defines this baseline - it represents what the model would predict "on average" without knowing specific feature values.
// Background data passed to explainer constructors
background := [][]float64{
{0.5, 1.2, 3.4},
{0.8, 1.5, 2.9},
{0.3, 0.9, 3.1},
// ... more samples
}
exp, err := kernel.New(model, background, opts)
Key Principles¶
1. Representativeness¶
Background data should represent the typical distribution of your input data.
Good practice:
// Use a random sample from your training/validation data
background := selectRandomSample(trainingData, 100)
Avoid:
// Don't use only extreme values
background := [][]float64{
{0.0, 0.0, 0.0}, // All minimum values
{1.0, 1.0, 1.0}, // All maximum values
}
2. Sample Size¶
The optimal background size depends on your explainer and computational budget.
| Explainer | Recommended Size | Notes |
|---|---|---|
| TreeSHAP | 100-1000 | More samples improve accuracy |
| KernelSHAP | 50-200 | Larger increases computation quadratically |
| DeepSHAP | 50-200 | Average over background is computed |
| GradientSHAP | 50-500 | Interpolation performed with each sample |
| PermutationSHAP | 100-500 | Marginal expectation estimation |
| SamplingSHAP | 100-500 | Monte Carlo sampling |
| LinearSHAP | 10-100 | Only needs mean estimation |
| ExactSHAP | 10-50 | Computational cost is O(n*2^d) |
3. Diversity¶
Include diverse samples that cover the feature space.
// Stratified sampling for classification tasks
background := make([][]float64, 0)
for _, class := range classes {
samples := selectFromClass(data, class, samplesPerClass)
background = append(background, samples...)
}
4. Feature Independence Assumption¶
Many SHAP methods assume features are independent given the background. If features are correlated:
- Consider larger background samples to capture correlations
- Use PartitionSHAP (future) for hierarchical feature grouping
- Be aware that explanations may misattribute between correlated features
Explainer-Specific Guidance¶
TreeSHAP¶
TreeSHAP uses background data for marginal expectation in tree paths.
import "github.com/plexusone/shap-go/explainer/tree"
// TreeSHAP benefits from more background samples
// 100-1000 samples recommended for stable estimates
background := selectRandomSample(trainingData, 500)
exp, err := tree.New(ensemble, background,
explainer.WithFeatureNames(featureNames),
)
Tips:
- Include edge cases if they're part of normal operation
- More samples improve accuracy but increase memory usage
- Background influences base value calculation
KernelSHAP¶
KernelSHAP's computation scales with background size. Choose carefully.
import "github.com/plexusone/shap-go/explainer/kernel"
// KernelSHAP: smaller background for speed
// Each sample adds a row to the weighted regression
background := selectRandomSample(trainingData, 100)
exp, err := kernel.New(model, background,
explainer.WithNumSamples(2048), // Coalition samples
)
Tips:
- Start with ~100 samples
- Use k-means clustering to select diverse representatives
- Background affects base value: E[f(x)] over background
DeepSHAP¶
DeepSHAP computes attributions relative to averaged reference activations.
import "github.com/plexusone/shap-go/explainer/deepshap"
// DeepSHAP: moderate background size
// Activations are averaged over all background samples
background := selectRandomSample(trainingData, 100)
exp, err := deepshap.New(activationSession, background)
Tips:
- Include representative examples from each class
- Avoid using only zeros (can cause numerical issues)
- Background activations are cached - larger samples increase memory
GradientSHAP¶
GradientSHAP interpolates between instance and background samples.
import "github.com/plexusone/shap-go/explainer/gradient"
// GradientSHAP: uses background for Expected Gradients
background := selectRandomSample(trainingData, 200)
exp, err := gradient.New(model, background,
[]explainer.Option{explainer.WithNumSamples(500)},
gradient.WithEpsilon(1e-4), // For numerical gradients
)
Tips:
- Each sample generates interpolated points
- More background samples reduce variance
- Include both typical and boundary cases
LinearSHAP¶
LinearSHAP only needs feature means from background.
import "github.com/plexusone/shap-go/explainer/linear"
// LinearSHAP: minimal background needed
// Only uses mean of background features
background := selectRandomSample(trainingData, 50)
exp, err := linear.New(weights, bias, background)
Tips:
- Even small samples give stable means
- Exact closed-form: no sampling variance
- Larger samples only marginally improve base value accuracy
Selection Strategies¶
Random Sampling¶
Simplest approach - randomly sample from training data.
func selectRandomSample(data [][]float64, n int) [][]float64 {
if n >= len(data) {
return data
}
indices := make([]int, len(data))
for i := range indices {
indices[i] = i
}
rand.Shuffle(len(indices), func(i, j int) {
indices[i], indices[j] = indices[j], indices[i]
})
result := make([][]float64, n)
for i := 0; i < n; i++ {
result[i] = data[indices[i]]
}
return result
}
Stratified Sampling¶
For classification, sample proportionally from each class.
func selectStratifiedSample(data [][]float64, labels []int, n int) [][]float64 {
// Group by label
byLabel := make(map[int][]int)
for i, label := range labels {
byLabel[label] = append(byLabel[label], i)
}
// Sample proportionally
result := make([][]float64, 0, n)
for _, indices := range byLabel {
count := int(float64(n) * float64(len(indices)) / float64(len(data)))
if count < 1 {
count = 1
}
rand.Shuffle(len(indices), func(i, j int) {
indices[i], indices[j] = indices[j], indices[i]
})
for i := 0; i < count && i < len(indices); i++ {
result = append(result, data[indices[i]])
}
}
return result
}
K-Means Clustering¶
Select cluster centroids for maximum diversity.
// Using a k-means library
func selectKMeansCentroids(data [][]float64, k int) [][]float64 {
// Run k-means clustering
clusters := kmeans.Cluster(data, k)
// Return centroids
return clusters.Centroids()
}
Prototype Selection¶
Select prototypical examples that represent data regions.
// Select samples closest to their cluster centers
func selectPrototypes(data [][]float64, k int) [][]float64 {
clusters := kmeans.Cluster(data, k)
prototypes := make([][]float64, k)
for i, centroid := range clusters.Centroids() {
prototypes[i] = findClosestPoint(data, centroid)
}
return prototypes
}
Common Pitfalls¶
1. Using All Zeros¶
Zero backgrounds can cause:
- Division by zero in some attribution rules
- Undefined gradients at discontinuities
- Explanations that don't generalize
2. Too Few Samples¶
Use at least 10-50 samples for stable explanations.
3. Outliers Only¶
// AVOID: Extreme values don't represent typical behavior
background := [][]float64{
extremeMin,
extremeMax,
}
Include typical values, not just boundary cases.
4. Ignoring Data Types¶
// For categorical features encoded as integers:
// Background should include all category values
background := [][]float64{
{0.0, 1.5, 2.3}, // category 0
{1.0, 1.8, 2.1}, // category 1
{2.0, 1.2, 2.5}, // category 2
}
5. Train vs. Test Distribution Shift¶
If your test data differs from training:
// If explaining production data, sample from production distribution
background := selectFromProductionData(n)
// Not just training data if distributions differ
// background := selectFromTrainingData(n) // May be inappropriate
Validation¶
Check Base Value Stability¶
// Base value should be stable across similar backgrounds
bg1 := selectRandomSample(data, 100)
bg2 := selectRandomSample(data, 100)
exp1, _ := kernel.New(model, bg1, opts)
exp2, _ := kernel.New(model, bg2, opts)
// Base values should be similar
diff := math.Abs(exp1.BaseValue() - exp2.BaseValue())
if diff > tolerance {
// Background may be too small or non-representative
}
Verify Local Accuracy¶
// SHAP values should sum to prediction - base_value
result, _ := exp.Explain(ctx, instance)
verify := result.Verify(tolerance)
if !verify.Valid {
// May indicate background issues
}
Check Explanation Stability¶
// Run multiple times with different seeds
explanations := make([]*explanation.Explanation, 10)
for i := 0; i < 10; i++ {
exp, _ := sampling.New(model, background,
explainer.WithSeed(int64(i)),
)
explanations[i], _ = exp.Explain(ctx, instance)
}
// Check variance in SHAP values
variance := computeVariance(explanations)
if variance > threshold {
// Consider larger background or more samples
}
Summary¶
| Aspect | Recommendation |
|---|---|
| Size | 50-500 samples depending on explainer |
| Selection | Random or stratified sampling from training data |
| Diversity | Cover feature space, include all categories |
| Validation | Check base value stability and local accuracy |
| Avoid | All zeros, single sample, outliers only |
Good background data selection is essential for meaningful SHAP explanations. When in doubt, use more samples from a representative subset of your data.