Background Data Selection Guide¶

Background data (also called reference data or baseline data) is fundamental to SHAP explanations. This guide covers best practices for selecting background data that produces meaningful and stable explanations.

What is Background Data?¶

SHAP values explain a prediction by comparing it to a baseline expectation. The background dataset defines this baseline - it represents what the model would predict "on average" without knowing specific feature values.

// Background data passed to explainer constructors
background := [][]float64{
    {0.5, 1.2, 3.4},
    {0.8, 1.5, 2.9},
    {0.3, 0.9, 3.1},
    // ... more samples
}

exp, err := kernel.New(model, background, opts)

Key Principles¶

1. Representativeness¶

Background data should represent the typical distribution of your input data.

Good practice:

// Use a random sample from your training/validation data
background := selectRandomSample(trainingData, 100)

Avoid:

// Don't use only extreme values
background := [][]float64{
    {0.0, 0.0, 0.0},  // All minimum values
    {1.0, 1.0, 1.0},  // All maximum values
}

2. Sample Size¶

The optimal background size depends on your explainer and computational budget.

Explainer	Recommended Size	Notes
TreeSHAP	100-1000	More samples improve accuracy
KernelSHAP	50-200	Larger increases computation quadratically
DeepSHAP	50-200	Average over background is computed
GradientSHAP	50-500	Interpolation performed with each sample
PermutationSHAP	100-500	Marginal expectation estimation
SamplingSHAP	100-500	Monte Carlo sampling
LinearSHAP	10-100	Only needs mean estimation
ExactSHAP	10-50	Computational cost is O(n*2^d)

3. Diversity¶

Include diverse samples that cover the feature space.

// Stratified sampling for classification tasks
background := make([][]float64, 0)
for _, class := range classes {
    samples := selectFromClass(data, class, samplesPerClass)
    background = append(background, samples...)
}

4. Feature Independence Assumption¶

Many SHAP methods assume features are independent given the background. If features are correlated:

Consider larger background samples to capture correlations
Use PartitionSHAP (future) for hierarchical feature grouping
Be aware that explanations may misattribute between correlated features

Explainer-Specific Guidance¶

TreeSHAP¶

TreeSHAP uses background data for marginal expectation in tree paths.

import "github.com/plexusone/shap-go/explainer/tree"

// TreeSHAP benefits from more background samples
// 100-1000 samples recommended for stable estimates
background := selectRandomSample(trainingData, 500)

exp, err := tree.New(ensemble, background,
    explainer.WithFeatureNames(featureNames),
)

Tips:

Include edge cases if they're part of normal operation
More samples improve accuracy but increase memory usage
Background influences base value calculation

KernelSHAP¶

KernelSHAP's computation scales with background size. Choose carefully.

import "github.com/plexusone/shap-go/explainer/kernel"

// KernelSHAP: smaller background for speed
// Each sample adds a row to the weighted regression
background := selectRandomSample(trainingData, 100)

exp, err := kernel.New(model, background,
    explainer.WithNumSamples(2048),  // Coalition samples
)

Tips:

Start with ~100 samples
Use k-means clustering to select diverse representatives
Background affects base value: E[f(x)] over background

DeepSHAP¶

DeepSHAP computes attributions relative to averaged reference activations.

import "github.com/plexusone/shap-go/explainer/deepshap"

// DeepSHAP: moderate background size
// Activations are averaged over all background samples
background := selectRandomSample(trainingData, 100)

exp, err := deepshap.New(activationSession, background)

Tips:

Include representative examples from each class
Avoid using only zeros (can cause numerical issues)
Background activations are cached - larger samples increase memory

GradientSHAP¶

GradientSHAP interpolates between instance and background samples.

import "github.com/plexusone/shap-go/explainer/gradient"

// GradientSHAP: uses background for Expected Gradients
background := selectRandomSample(trainingData, 200)

exp, err := gradient.New(model, background,
    []explainer.Option{explainer.WithNumSamples(500)},
    gradient.WithEpsilon(1e-4),  // For numerical gradients
)

Tips:

Each sample generates interpolated points
More background samples reduce variance
Include both typical and boundary cases

LinearSHAP¶

LinearSHAP only needs feature means from background.

import "github.com/plexusone/shap-go/explainer/linear"

// LinearSHAP: minimal background needed
// Only uses mean of background features
background := selectRandomSample(trainingData, 50)

exp, err := linear.New(weights, bias, background)

Tips:

Even small samples give stable means
Exact closed-form: no sampling variance
Larger samples only marginally improve base value accuracy

Selection Strategies¶

Random Sampling¶

Simplest approach - randomly sample from training data.

func selectRandomSample(data [][]float64, n int) [][]float64 {
    if n >= len(data) {
        return data
    }

    indices := make([]int, len(data))
    for i := range indices {
        indices[i] = i
    }

    rand.Shuffle(len(indices), func(i, j int) {
        indices[i], indices[j] = indices[j], indices[i]
    })

    result := make([][]float64, n)
    for i := 0; i < n; i++ {
        result[i] = data[indices[i]]
    }
    return result
}

Stratified Sampling¶

For classification, sample proportionally from each class.

func selectStratifiedSample(data [][]float64, labels []int, n int) [][]float64 {
    // Group by label
    byLabel := make(map[int][]int)
    for i, label := range labels {
        byLabel[label] = append(byLabel[label], i)
    }

    // Sample proportionally
    result := make([][]float64, 0, n)
    for _, indices := range byLabel {
        count := int(float64(n) * float64(len(indices)) / float64(len(data)))
        if count < 1 {
            count = 1
        }
        rand.Shuffle(len(indices), func(i, j int) {
            indices[i], indices[j] = indices[j], indices[i]
        })
        for i := 0; i < count && i < len(indices); i++ {
            result = append(result, data[indices[i]])
        }
    }
    return result
}

K-Means Clustering¶

Select cluster centroids for maximum diversity.

// Using a k-means library
func selectKMeansCentroids(data [][]float64, k int) [][]float64 {
    // Run k-means clustering
    clusters := kmeans.Cluster(data, k)

    // Return centroids
    return clusters.Centroids()
}

Prototype Selection¶

Select prototypical examples that represent data regions.

// Select samples closest to their cluster centers
func selectPrototypes(data [][]float64, k int) [][]float64 {
    clusters := kmeans.Cluster(data, k)

    prototypes := make([][]float64, k)
    for i, centroid := range clusters.Centroids() {
        prototypes[i] = findClosestPoint(data, centroid)
    }
    return prototypes
}

Common Pitfalls¶

1. Using All Zeros¶

// AVOID: Zero background can cause numerical issues
background := [][]float64{{0, 0, 0, 0}}

Zero backgrounds can cause:

Division by zero in some attribution rules
Undefined gradients at discontinuities
Explanations that don't generalize

2. Too Few Samples¶

// AVOID: Single sample gives unstable estimates
background := [][]float64{{0.5, 0.5, 0.5}}

Use at least 10-50 samples for stable explanations.

3. Outliers Only¶

// AVOID: Extreme values don't represent typical behavior
background := [][]float64{
    extremeMin,
    extremeMax,
}

Include typical values, not just boundary cases.

4. Ignoring Data Types¶

// For categorical features encoded as integers:
// Background should include all category values
background := [][]float64{
    {0.0, 1.5, 2.3},  // category 0
    {1.0, 1.8, 2.1},  // category 1
    {2.0, 1.2, 2.5},  // category 2
}

5. Train vs. Test Distribution Shift¶

If your test data differs from training:

// If explaining production data, sample from production distribution
background := selectFromProductionData(n)

// Not just training data if distributions differ
// background := selectFromTrainingData(n)  // May be inappropriate

Validation¶

Check Base Value Stability¶

// Base value should be stable across similar backgrounds
bg1 := selectRandomSample(data, 100)
bg2 := selectRandomSample(data, 100)

exp1, _ := kernel.New(model, bg1, opts)
exp2, _ := kernel.New(model, bg2, opts)

// Base values should be similar
diff := math.Abs(exp1.BaseValue() - exp2.BaseValue())
if diff > tolerance {
    // Background may be too small or non-representative
}

Verify Local Accuracy¶

// SHAP values should sum to prediction - base_value
result, _ := exp.Explain(ctx, instance)
verify := result.Verify(tolerance)
if !verify.Valid {
    // May indicate background issues
}

Check Explanation Stability¶

// Run multiple times with different seeds
explanations := make([]*explanation.Explanation, 10)
for i := 0; i < 10; i++ {
    exp, _ := sampling.New(model, background,
        explainer.WithSeed(int64(i)),
    )
    explanations[i], _ = exp.Explain(ctx, instance)
}

// Check variance in SHAP values
variance := computeVariance(explanations)
if variance > threshold {
    // Consider larger background or more samples
}

Summary¶

Aspect	Recommendation
Size	50-500 samples depending on explainer
Selection	Random or stratified sampling from training data
Diversity	Cover feature space, include all categories
Validation	Check base value stability and local accuracy
Avoid	All zeros, single sample, outliers only

Good background data selection is essential for meaningful SHAP explanations. When in doubt, use more samples from a representative subset of your data.