Black-Box Is Not an Option: SHAP and LIME Tutorial in Python (2026)
Your XGBoost model claims 94% accuracy on tests. The AUC is 0.97. You even feel a bit proud. Then a client shows up — card denied, low score — and asks: "Why was I rejected?" And you can't answer.
This scenario is no longer just a trust issue with the client. It has become a legal problem.
On August 2, 2026 — 60 days from now — the EU AI Act comes into force with concrete transparency requirements. Article 86 guarantees any European citizen the right to receive an individual explanation for decisions made by AI systems. Article 13 determines that high-risk systems must be designed to allow interpretation of outputs. And the fines? Up to €35 million or 7% of global annual turnover (Article 99).
"Model interpretability has shifted from 'nice to have' to mandatory. The EU AI Act's Article 13 requires high-risk AI systems to provide sufficient transparency for deployers to interpret outputs." — LDS Team, March 2026
This is where SHAP and LIME come in — two libraries that transform your model's black box into explanations that a human (and an auditor) can understand.
In this tutorial, you will learn, with functional Python code, how to apply SHAP and LIME to a credit scoring model using XGBoost — and how to prepare for the new regulatory landscape without redoing your entire ML pipeline.
What are SHAP and LIME? (And why you need both)
Before diving into the code, it's worth understanding what each one does — and, most importantly, when to use which.
SHAP (SHapley Additive exPlanations) comes from game theory: each feature is a "player" and its Shapley value is the average contribution to the prediction, considering all possible feature combinations. It is mathematically consistent — it respects axioms like additivity and consistency that LIME does not guarantee.
LIME (Local Interpretable Model-agnostic Explanations) creates a simple surrogate model (e.g., linear) locally around a specific prediction. It is faster to compute, but lacks the same mathematical guarantees.
| Feature | SHAP (TreeSHAP) | LIME |
|---|---|---|
| Theoretical foundation | Shapley values (cooperative game theory) | Local surrogate model |
| Speed | Slow for KernelSHAP (O(2^F)), fast with TreeSHAP (O(TLD²)) | Fast, scales well |
| Consistency | ✅ Guaranteed by axioms | ❌ Can be inconsistent |
| Global interpretation | ✅ Yes (beeswarm, summary, bar) | ❌ Local only |
| Ideal for | Tree-based models in production | Quick exploration, non-tree models |
| GitHub Stars | 25,000+ | 12,000+ |
The ML community recommendation is straightforward:
TreeSHAP is the gold standard for tree-based models in production. It is exact, fast, and the only method offering consistent local and global explanations within the same framework. — Adapted from Python Data Bench, Feb. 2026
That said, LIME is still useful. You'll see both in the tutorial and understand the differences in practice.
Setup: Installing the libraries
Create a virtual environment and install the dependencies:
pip install shap lime xgboost pandas numpy matplotlib scikit-learn
The versions used here: SHAP v0.51.0 (released March 4, 2026, with exact TreeSHAP for XGBoost, LightGBM, and CatBoost), XGBoost 3.2, pandas, and scikit-learn 1.6+.
Dataset: German Credit — the classic credit scoring dataset
We'll use the German Credit Dataset from UCI, which has 1000 customer instances with 20 features (7 numerical, 13 categorical) and the target variable: good payer (1) or bad payer (0).
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
URLs do dataset German Credit (UCI)
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/statlog/german/german.data" columns = [ "status", "duration", "credit_history", "purpose", "credit_amount", "savings", "employment", "installment_rate", "personal_status", "other_debtors", "residence_since", "property", "age", "other_installment", "housing", "existing_credits", "job", "dependents", "telephone", "foreign_worker", "risk" ]
df = pd.read_csv(url, sep=" ", header=None, names=columns)
Target: 1 = good, 2 = bad (we'll map to 1 and 0)
df["risk"] = df["risk"].map({1: 1, 2: 0})
Encode categorical variables
categorical_cols = df.select_dtypes(include=["object"]).columns.tolist() le_dict = {} for col in categorical_cols: le = LabelEncoder() df[col] = le.fit_transform(df[col]) le_dict[col] = le
Separate features and target
X = df.drop("risk", axis=1) y = df["risk"]
Train/test split
X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42, stratify=y )
print(f"Train: {X_train.shape}, Test: {X_test.shape}") print(f"Target distribution: {y.value_counts().to_dict()}")
The dataset has 700 good payers and 300 bad payers — imbalanced, just like real life. Our model will learn to separate the two groups.
Training the model: XGBoost for credit
import xgboost as xgb
model = xgb.XGBClassifier( n_estimators=200, max_depth=6, learning_rate=0.1, subsample=0.8, colsample_bytree=0.8, random_state=42, eval_metric="logloss", )
model.fit( X_train, y_train, eval_set=[(X_test, y_test)], verbose=False )
Basic accuracy
from sklearn.metrics import accuracy_score, roc_auc_score
y_pred = model.predict(X_test) y_proba = model.predict_proba(X_test)[:, 1]
print(f"Accuracy: {accuracy_score(y_test, y_pred):.3f}") print(f"AUC-ROC: {roc_auc_score(y_test, y_proba):.3f}")
In our run, the model delivered accuracy of 0.82 and AUC of 0.89 — not bad for a relatively small dataset. But what matters now is: can we explain why client John had his credit denied?
Explaining with SHAP (TreeSHAP)
Here comes the magic. SHAP v0.51.0 automatically detects that you are using XGBoost and applies TreeSHAP — which computes exact Shapley values in polynomial time, unlike KernelSHAP which scales exponentially.
import shap
Create the explainer (automatic TreeSHAP for XGBoost)
explainer = shap.TreeExplainer(model) shap_values = explainer.shap_values(X_test)
Check shape
print(f"SHAP values shape: {shap_values.shape}") print(f"Expected shape: {X_test.shape}")
Waterfall plot: explaining a specific prediction
Let's take the first client in the test set — the one the model classified with the highest probability of being a bad payer — and see each feature pushing the score up or down.
# Index of the client with the highest probability of being a bad payer
idx_mau = y_proba.argmax()
shap.initjs() shap.waterfall_plot( shap.Explanation( values=shap_values[idx_mau], base_values=explainer.expected_value, data=X_test.iloc[idx_mau].values, feature_names=X_test.columns.tolist() ) )
The chart shows a baseline (the average expected value) and each feature pushing the prediction to the right (increases risk) or left (decreases risk). In our case, the client was rejected mainly because:
- Current account status very negative (most impactful feature)
- Credit amount high relative to income
- Credit history with previous delays
This is the answer you need to give the client — and what Article 86 of the EU AI Act requires.
Beeswarm plot: global view of the model
If the waterfall is for one client, the beeswarm is for all clients at once. It shows the distribution of SHAP values per feature.
shap.beeswarm_plot(
shap.Explanation(
values=shap_values,
base_values=explainer.expected_value,
data=X_test.values,
feature_names=X_test.columns.tolist()
)
)
In the beeswarm, you can immediately see which features impact the model globally:
- status is by far the most important
- credit_amount and duration come next
- dependents and telephone have almost no influence
This is gold for model auditing (Article 13 of the EU AI Act) — you can demonstrate that the model is paying attention to the right features and not to discriminatory proxies.
Dependence plot: how a specific feature impacts
Want to see how the client's age affects the prediction? The dependence plot shows the relationship between the feature value and the corresponding SHAP value.
shap.dependence_plot("age", shap_values, X_test, interaction_index="credit_amount")
This graph reveals something interesting: younger clients (20-30 years old) have negative SHAP values (less risk of default? Depends on context), while the effect varies with the credit amount. An essential tool for detecting age bias in the model.
Explaining with LIME
Now let's see the explanation for the same client using LIME — faster, less mathematical rigor, but still useful.
import lime
import lime.lime_tabular
Create the LIME explainer
explainer_lime = lime.lime_tabular.LimeTabularExplainer( X_train.values, feature_names=X_train.columns.tolist(), class_names=["good", "bad"], mode="classification", random_state=42 )
Explain the same rejected client prediction
exp = explainer_lime.explain_instance( X_test.iloc[idx_mau].values, model.predict_proba, num_features=8 )
Show the explanation
exp.show_in_notebook(show_table=True)
LIME returns a list of features with positive weights (contributing to "bad") and negative weights (contributing to "good"). The most important features are consistent with SHAP, but the numerical values are different — LIME doesn't have the same mathematical foundation.
| Feature | LIME Weight | SHAP value |
|---|---|---|
| status | +0.31 | +0.45 |
| credit_amount | +0.12 | +0.18 |
| credit_history | +0.08 | +0.11 |
The directions are the same, but the magnitudes vary. In production, trust SHAP. Use LIME for quick exploration or when dealing with non-tree-based models.
SHAP in production: mini explanation API with FastAPI
Having explanations in your notebook is useless if your production system can't deliver them. The ideal is to save SHAP values alongside each prediction and expose them via an API.
Here's a sketch of how to do it:
# fastapi_explainer.py
import joblib
import shap
import pandas as pd
from fastapi import FastAPI
from pydantic import BaseModel
app = FastAPI() model = joblib.load("xgboost_credit.pkl") explainer = shap.TreeExplainer(model)
class ClientData(BaseModel): features: dict
@app.post("/predict") def predict(client: ClientData): df = pd.DataFrame([client.features]) pred = model.predict(df)[0] proba = model.predict_proba(df)[0][1] shap_values = explainer.shap_values(df)
features_impact = [
{"feature": col, "value": float(client.features[col]), "impact": float(shap_values[0][i])}
for i, col in enumerate(client.features.keys())
]
features_impact.sort(key=lambda x: abs(x["impact"]), reverse=True)
return {
"prediction": int(pred),
"probability": float(proba),
"top_features": features_impact[:5],
"base_value": float(explainer.expected_value)
}
With this endpoint, any system can:
- Query the prediction
- Receive the top 5 features that most impacted it
- Return the explanation in JSON format for the frontend, report, or audit
"Organizations that bolt explainability on after model training rather than building it into the AI lifecycle produce shallow explanations that regulators can easily challenge." — Seekr, 2026 Report
In other words: put SHAP in the pipeline from the start — not after the auditor knocks on the door.
Interpretability in the age of regulation
The EU AI Act comes into effect in 60 days (August 2, 2026). The AI Omnibus of May 7, 2026 postponed some obligations to December 2027, but the transparency rules of Article 50 apply from August. Those operating high-risk AI systems in Europe need to comply.
A study published in January 2026 in the journal Risks already proposes an interpretability framework based on the consistency between SHAP and LIME for bond default prediction. The direction is clear: explainability is no longer a competitive differentiator — it is an operational requirement.
"The evidence is clear: regulators are going to demand interpretability as a standard part of AI governance, and tools like SHAP that provide mathematical guarantees will become table stakes." — Let's Data Science, March 2026
If you want to delve deeper into ML fundamentals before applying interpretability, check out our complete guide: Machine Learning Explained: Complete Guide for Beginners.
Conclusion — and your next steps
Let's recap what you learned:
- SHAP (TreeSHAP) explains individual and global predictions with mathematical guarantees — use it in production for tree-based models.
- LIME is fast and useful for exploration, especially for non-tree-based models.
- The EU AI Act (effective in 60 days) has turned explainability from "nice to have" into a legal obligation.
- You can integrate SHAP into your production pipeline with FastAPI in just a few lines.
Your next steps:
- Run the code from this tutorial in your environment. The German Credit dataset is public and free.
- Apply it to your model — whether it's XGBoost, LightGBM, or CatBoost, TreeSHAP works.
- Save the SHAP values alongside each prediction in production.
- Document the explanations to be ready when the auditor (or the client) asks.
Because in 2026, "the model said so" is not an answer — for anyone.
References and further reading
- Lundberg et al., Nature Machine Intelligence