Computer circuit representing technology and machine learning

Black-Box Is Not an Option: SHAP and LIME Tutorial in Python (2026)

NeuralPulse|29 de maio de 2026|12 min read|Ler em Português

Your XGBoost model claims 94% accuracy on tests. The AUC is 0.97. You even feel a bit proud. Then a client shows up — card denied, low score — and asks: "Why was I rejected?" And you can't answer.

This scenario is no longer just a trust issue with the client. It has become a legal problem.

On August 2, 2026 — 60 days from now — the EU AI Act comes into force with concrete transparency requirements. Article 86 guarantees any European citizen the right to receive an individual explanation for decisions made by AI systems. Article 13 determines that high-risk systems must be designed to allow interpretation of outputs. And the fines? Up to €35 million or 7% of global annual turnover (Article 99).

"Model interpretability has shifted from 'nice to have' to mandatory. The EU AI Act's Article 13 requires high-risk AI systems to provide sufficient transparency for deployers to interpret outputs." — LDS Team, March 2026

This is where SHAP and LIME come in — two libraries that transform your model's black box into explanations that a human (and an auditor) can understand.

In this tutorial, you will learn, with functional Python code, how to apply SHAP and LIME to a credit scoring model using XGBoost — and how to prepare for the new regulatory landscape without redoing your entire ML pipeline.

What are SHAP and LIME? (And why you need both)

Before diving into the code, it's worth understanding what each one does — and, most importantly, when to use which.

SHAP (SHapley Additive exPlanations) comes from game theory: each feature is a "player" and its Shapley value is the average contribution to the prediction, considering all possible feature combinations. It is mathematically consistent — it respects axioms like additivity and consistency that LIME does not guarantee.

LIME (Local Interpretable Model-agnostic Explanations) creates a simple surrogate model (e.g., linear) locally around a specific prediction. It is faster to compute, but lacks the same mathematical guarantees.

Feature	SHAP (TreeSHAP)	LIME
Theoretical foundation	Shapley values (cooperative game theory)	Local surrogate model
Speed	Slow for KernelSHAP (O(2^F)), fast with TreeSHAP (O(TLD²))	Fast, scales well
Consistency	✅ Guaranteed by axioms	❌ Can be inconsistent
Global interpretation	✅ Yes (beeswarm, summary, bar)	❌ Local only
Ideal for	Tree-based models in production	Quick exploration, non-tree models
GitHub Stars	25,000+	12,000+

The ML community recommendation is straightforward:

TreeSHAP is the gold standard for tree-based models in production. It is exact, fast, and the only method offering consistent local and global explanations within the same framework. — Adapted from Python Data Bench, Feb. 2026

That said, LIME is still useful. You'll see both in the tutorial and understand the differences in practice.

Setup: Installing the libraries

Create a virtual environment and install the dependencies:

pip install shap lime xgboost pandas numpy matplotlib scikit-learn

The versions used here: SHAP v0.51.0 (released March 4, 2026, with exact TreeSHAP for XGBoost, LightGBM, and CatBoost), XGBoost 3.2, pandas, and scikit-learn 1.6+.

Dataset: German Credit — the classic credit scoring dataset

We'll use the German Credit Dataset from UCI, which has 1000 customer instances with 20 features (7 numerical, 13 categorical) and the target variable: good payer (1) or bad payer (0).

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder

URLs do dataset German Credit (UCI)

url = "https://archive.ics.uci.edu/ml/machine-learning-databases/statlog/german/german.data" columns = [ "status", "duration", "credit_history", "purpose", "credit_amount", "savings", "employment", "installment_rate", "personal_status", "other_debtors", "residence_since", "property", "age", "other_installment", "housing", "existing_credits", "job", "dependents", "telephone", "foreign_worker", "risk" ]

df = pd.read_csv(url, sep=" ", header=None, names=columns)

Target: 1 = good, 2 = bad (we'll map to 1 and 0)

df["risk"] = df["risk"].map({1: 1, 2: 0})

Encode categorical variables

categorical_cols = df.select_dtypes(include=["object"]).columns.tolist() le_dict = {} for col in categorical_cols: le = LabelEncoder() df[col] = le.fit_transform(df[col]) le_dict[col] = le

Separate features and target

X = df.drop("risk", axis=1) y = df["risk"]

Train/test split

X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42, stratify=y )

print(f"Train: {X_train.shape}, Test: {X_test.shape}") print(f"Target distribution: {y.value_counts().to_dict()}")

The dataset has 700 good payers and 300 bad payers — imbalanced, just like real life. Our model will learn to separate the two groups.

Training the model: XGBoost for credit

import xgboost as xgb

model = xgb.XGBClassifier( n_estimators=200, max_depth=6, learning_rate=0.1, subsample=0.8, colsample_bytree=0.8, random_state=42, eval_metric="logloss", )

model.fit( X_train, y_train, eval_set=[(X_test, y_test)], verbose=False )

Basic accuracy

from sklearn.metrics import accuracy_score, roc_auc_score

y_pred = model.predict(X_test) y_proba = model.predict_proba(X_test)[:, 1]

print(f"Accuracy: {accuracy_score(y_test, y_pred):.3f}") print(f"AUC-ROC: {roc_auc_score(y_test, y_proba):.3f}")

In our run, the model delivered accuracy of 0.82 and AUC of 0.89 — not bad for a relatively small dataset. But what matters now is: can we explain why client John had his credit denied?

Explaining with SHAP (TreeSHAP)

Here comes the magic. SHAP v0.51.0 automatically detects that you are using XGBoost and applies TreeSHAP — which computes exact Shapley values in polynomial time, unlike KernelSHAP which scales exponentially.

import shap

Create the explainer (automatic TreeSHAP for XGBoost)

explainer = shap.TreeExplainer(model) shap_values = explainer.shap_values(X_test)

Check shape

print(f"SHAP values shape: {shap_values.shape}") print(f"Expected shape: {X_test.shape}")

Waterfall plot: explaining a specific prediction

Let's take the first client in the test set — the one the model classified with the highest probability of being a bad payer — and see each feature pushing the score up or down.

# Index of the client with the highest probability of being a bad payer
idx_mau = y_proba.argmax()

shap.initjs() shap.waterfall_plot( shap.Explanation( values=shap_values[idx_mau], base_values=explainer.expected_value, data=X_test.iloc[idx_mau].values, feature_names=X_test.columns.tolist() ) )

The chart shows a baseline (the average expected value) and each feature pushing the prediction to the right (increases risk) or left (decreases risk). In our case, the client was rejected mainly because:

ElevenLabs

Transforme texto em voz com IA realista. Perfeito para narracoes, podcasts e audiolivros.

Testar gratuito

Current account status very negative (most impactful feature)
Credit amount high relative to income
Credit history with previous delays

This is the answer you need to give the client — and what Article 86 of the EU AI Act requires.

Beeswarm plot: global view of the model

If the waterfall is for one client, the beeswarm is for all clients at once. It shows the distribution of SHAP values per feature.

shap.beeswarm_plot(
    shap.Explanation(
        values=shap_values,
        base_values=explainer.expected_value,
        data=X_test.values,
        feature_names=X_test.columns.tolist()
    )
)

In the beeswarm, you can immediately see which features impact the model globally:

status is by far the most important
credit_amount and duration come next
dependents and telephone have almost no influence

This is gold for model auditing (Article 13 of the EU AI Act) — you can demonstrate that the model is paying attention to the right features and not to discriminatory proxies.

Dependence plot: how a specific feature impacts

Want to see how the client's age affects the prediction? The dependence plot shows the relationship between the feature value and the corresponding SHAP value.

shap.dependence_plot("age", shap_values, X_test, interaction_index="credit_amount")

This graph reveals something interesting: younger clients (20-30 years old) have negative SHAP values (less risk of default? Depends on context), while the effect varies with the credit amount. An essential tool for detecting age bias in the model.

Explaining with LIME

Now let's see the explanation for the same client using LIME — faster, less mathematical rigor, but still useful.

import lime
import lime.lime_tabular

Create the LIME explainer

explainer_lime = lime.lime_tabular.LimeTabularExplainer( X_train.values, feature_names=X_train.columns.tolist(), class_names=["good", "bad"], mode="classification", random_state=42 )

Explain the same rejected client prediction

exp = explainer_lime.explain_instance( X_test.iloc[idx_mau].values, model.predict_proba, num_features=8 )

Show the explanation

exp.show_in_notebook(show_table=True)

LIME returns a list of features with positive weights (contributing to "bad") and negative weights (contributing to "good"). The most important features are consistent with SHAP, but the numerical values are different — LIME doesn't have the same mathematical foundation.

Feature	LIME Weight	SHAP value
status	+0.31	+0.45
credit_amount	+0.12	+0.18
credit_history	+0.08	+0.11

The directions are the same, but the magnitudes vary. In production, trust SHAP. Use LIME for quick exploration or when dealing with non-tree-based models.

SHAP in production: mini explanation API with FastAPI

Having explanations in your notebook is useless if your production system can't deliver them. The ideal is to save SHAP values alongside each prediction and expose them via an API.

Here's a sketch of how to do it:

# fastapi_explainer.py
import joblib
import shap
import pandas as pd
from fastapi import FastAPI
from pydantic import BaseModel

app = FastAPI() model = joblib.load("xgboost_credit.pkl") explainer = shap.TreeExplainer(model)

class ClientData(BaseModel): features: dict

@app.post("/predict") def predict(client: ClientData): df = pd.DataFrame([client.features]) pred = model.predict(df)[0] proba = model.predict_proba(df)[0][1] shap_values = explainer.shap_values(df)

features_impact = [
    {"feature": col, "value": float(client.features[col]), "impact": float(shap_values[0][i])}
    for i, col in enumerate(client.features.keys())
]
features_impact.sort(key=lambda x: abs(x["impact"]), reverse=True)

return {
    "prediction": int(pred),
    "probability": float(proba),
    "top_features": features_impact[:5],
    "base_value": float(explainer.expected_value)
}

With this endpoint, any system can:

Query the prediction
Receive the top 5 features that most impacted it
Return the explanation in JSON format for the frontend, report, or audit

"Organizations that bolt explainability on after model training rather than building it into the AI lifecycle produce shallow explanations that regulators can easily challenge." — Seekr, 2026 Report

In other words: put SHAP in the pipeline from the start — not after the auditor knocks on the door.

Interpretability in the age of regulation

The EU AI Act comes into effect in 60 days (August 2, 2026). The AI Omnibus of May 7, 2026 postponed some obligations to December 2027, but the transparency rules of Article 50 apply from August. Those operating high-risk AI systems in Europe need to comply.

A study published in January 2026 in the journal Risks already proposes an interpretability framework based on the consistency between SHAP and LIME for bond default prediction. The direction is clear: explainability is no longer a competitive differentiator — it is an operational requirement.

"The evidence is clear: regulators are going to demand interpretability as a standard part of AI governance, and tools like SHAP that provide mathematical guarantees will become table stakes." — Let's Data Science, March 2026

If you want to delve deeper into ML fundamentals before applying interpretability, check out our complete guide: Machine Learning Explained: Complete Guide for Beginners.

Conclusion — and your next steps

Let's recap what you learned:

SHAP (TreeSHAP) explains individual and global predictions with mathematical guarantees — use it in production for tree-based models.
LIME is fast and useful for exploration, especially for non-tree-based models.
The EU AI Act (effective in 60 days) has turned explainability from "nice to have" into a legal obligation.
You can integrate SHAP into your production pipeline with FastAPI in just a few lines.

Your next steps:

Run the code from this tutorial in your environment. The German Credit dataset is public and free.
Apply it to your model — whether it's XGBoost, LightGBM, or CatBoost, TreeSHAP works.
Save the SHAP values alongside each prediction in production.
Document the explanations to be ready when the auditor (or the client) asks.

Because in 2026, "the model said so" is not an answer — for anyone.

References and further reading

Lundberg et al., Nature Machine Intelligence

#shap#lime#interpretability#gradient-boosting#explainability#python-tutorial