MLOps in Practice: ML Pipeline with CI/CD, Drift and Versioning in 2026
Have you ever put a model into production, only to watch its accuracy plummet a week later without warning? That's the nightmare for 62% of companies scaling machine learning that haven't yet automated retraining (MLOps Survey 2026). Meanwhile, 78% of organizations using MLOps report more predictable deliveries and fewer incidents.
The problem isn't a lack of tools — it's the absence of an integrated pipeline. In this tutorial, you'll build a complete MLOps flow from scratch using only open-source tools: MLflow for experiment tracking, DVC for dataset versioning, GitHub Actions for CI/CD, and Evidently AI for drift monitoring. The focus is practical: every step includes executable code and YAML configuration.
The real bottleneck in ML isn't building models, it's keeping them alive in production. A well-designed MLOps pipeline transforms deployment from a stressful event into a predictable and reversible process.
1. Data Versioning with DVC: The Pillar Everyone Ignores
Without data versioning, your ML pipeline is a black box. You train the model today, it works. Three months later, someone updates the dataset without telling you — and the model breaks. DVC (Data Version Control) solves this by treating datasets as versionable objects linked to Git.
Step 1: Initialize DVC in your repository
pip install dvc
git init
dvc init
This creates a .dvc/ directory and a configuration file. Nothing changes in Git yet.
Step 2: Add a dataset to version control
dvc add data/train.csv
git add data/train.csv.dvc .gitignore
git commit -m "Add training dataset v1.0"
DVC doesn't store the large file in Git. It creates a lightweight .dvc file pointing to the file's hash. The actual data goes to a local or remote cache (S3, GCS, SFTP server).
Step 3: Configure a remote (example: S3 bucket)
dvc remote add -d myremote s3://my-ml-bucket/dvc-cache
dvc push
Now anyone on the team can download the exact same dataset version with dvc pull. This is essential for reproducibility. If the dataset changes, DVC detects the difference by the hash — and you can compare versions.
Table 1: Quick comparison of versioning tools
| Tool | Main Focus | Storage | Git Integration |
|---|---|---|---|
| DVC | Data and models | Local + remote cache | Full (via .dvc) |
| Git LFS | Large files | LFS server | Partial (replaces files) |
| MLflow | Experiments and models | Tracking server | Partial (via URI) |
| Hugging Face Hub | Models and datasets | Remote Hub | Independent |
Practical tip: Always run dvc checkout after a git checkout to ensure local data matches the commit. Automate this with a Git hook.
2. CI/CD for ML: GitHub Actions + MLflow + Model Tests
CI/CD for ML isn't the same as for regular software. Besides testing code, you need to validate data, model metrics, and detect information leakage. MLflow comes in here as the brain of experiments, storing parameters, metrics, and artifacts.
Repository structure:
my-ml-project/
├── .github/workflows/ci-ml.yml
├── data/
├── notebooks/
├── src/
│ ├── train.py
│ ├── evaluate.py
│ └── data_validation.py
├── models/
├── requirements.txt
├── DVC
└── README.md
ci-ml.yml file (GitHub Actions):
name: ML Pipeline CI/CD
on: push: branches: [main] pull_request: branches: [main]
jobs: train-and-evaluate: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: actions/setup-python@v5 with: python-version: '3.11' - name: Install dependencies run: pip install -r requirements.txt - name: Download versioned data run: | dvc pull dvc checkout - name: Run data validation run: python src/data_validation.py - name: Train model run: | python src/train.py --experiment-name "ci-$(date +%Y%m%d-%H%M%S)" - name: Evaluate model run: python src/evaluate.py --threshold 0.75 - name: Register model in MLflow run: | mlflow models register -m runs:/<run-id>/model -n my_production_model
Highlight for data_validation.py: it checks if the dataset has no null values in critical columns, if the feature distribution is within expected limits, and if the number of rows is coherent. Without this validation, a corrupted dataset could slip through and generate a useless model.
Editorial blockquote (no attribution):
Training a model with dirty data is like cooking with spoiled ingredients: you only discover the mistake after the food is on the table — and the customer has already complained.
Example train.py with MLflow:
import mlflow
import mlflow.sklearn
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
mlflow.set_tracking_uri("http://localhost:5000")
with mlflow.start_run(): # Parameters n_estimators = 100 max_depth = 10 mlflow.log_param("n_estimators", n_estimators) mlflow.log_param("max_depth", max_depth)
# Training
model = RandomForestClassifier(n_estimators=n_estimators, max_depth=max_depth)
model.fit(X_train, y_train)
# Metrics
acc = model.score(X_test, y_test)
mlflow.log_metric("accuracy", acc)
# Log model
mlflow.sklearn.log_model(model, "model")
The CI/CD will run this script on every push to main. If accuracy drops below 0.75, the pipeline fails — and the model doesn't go to production. This is what separates responsible deployment from a "cross your fingers and pray" approach.
3. Drift Monitoring with Evidently AI: When the Model Starts to Fail
Models in production are victims of time. Data changes, concepts evolve, and what worked in January can be disastrous in June. Evidently AI is the standard open-source tool for detecting data drift and concept drift.
Step 1: Install Evidently
pip install evidently
Step 2: Create a monitoring dashboard
import pandas as pd
from evidently.report import Report
from evidently.metric_preset import DataDriftPreset, RegressionPreset
Reference data (original training)
reference = pd.read_csv("data/train.csv")
Current data (last week of production)
current = pd.read_csv("data/production_latest.csv")
report = Report(metrics=[ DataDriftPreset(), RegressionPreset() ]) report.run(reference_data=reference, current_data=current) report.save_html("monitoring_report.html")
This report shows, for each feature, whether there was statistical drift (Kolmogorov-Smirnov or chi-squared test) and its magnitude. If more than 20% of features show drift, the alarm should trigger.
Step 3: Automate the alert with GitHub Actions (scheduled)
You can run monitoring as a weekly job in GitHub Actions:
name: Weekly Drift Monitoring
on: schedule: - cron: '0 6 * * 1' # every Monday 6 AM UTC
jobs:
check-drift:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install dependencies
run: pip install -r requirements.txt
- name: Run monitoring
run: python src/monitor_drift.py
- name: Send alert if high drift
run: |
if grep -q "drift_detected: true" drift_report.json; then
curl -X POST -H "Content-Type: application/json"
-d '{"text":"Drift detected! Initiate retraining."}'
$SLACK_WEBHOOK_URL
fi
Integration with automatic retraining: When drift exceeds the threshold, the pipeline can trigger a new training via CI/CD. MLflow versions the new model, and DVC ensures the correct dataset is used. Everything orchestrated.
Kubeflow for heavy orchestration: If your company operates multiple models at scale, Kubeflow offers native Kubernetes pipelines. It manages dependencies, parallelism, and computational resources. The CNCF reports that tools like MLflow, DVC, and Kubeflow reduce model deployment time by 40% (CNCF 2026). This means a process that took two weeks now takes six business days.
Final checklist for your MLOps pipeline in 2026:
- Data versioning with DVC (atomic commits, remote cache)
- Experiment tracking with MLflow (parameters, metrics, artifacts)
- CI/CD with GitHub Actions (data validation + training + evaluation)
- Model tests (minimum performance threshold)
- Automatic approved model registration (MLflow Model Registry)
- Drift monitoring with Evidently AI (weekly report + alert)
- Automatic retraining trigger (when drift > 20% or performance drops)
Conclusion
MLOps isn't a luxury for tech companies — it's the minimum barrier for any team wanting to keep models in production without surprises. This tutorial showed the skeleton of a functional pipeline with DVC, MLflow, GitHub Actions, and Evidently AI. The next step is to adapt it to your reality: choose your data remote, define the drift thresholds that make sense for your domain, and automate retraining. The cost of ignoring MLOps is predictable: models that die silently, wrong decisions, and rework. The cost of implementing it is a weekend of setup and a culture of technical discipline. The choice is yours.
Related Articles
Related Articles
The End of AI Generalists: Why Deep Specialization Is Paying 3x More in 2026
Generalist data scientist positions have dropped 62% in two years. Meanwhile, AI agent and MLOps specialists earn up to 3x more. The AI market...
Automated ML Pipeline with Kubeflow in 2026: Practical Tutorial for Orchestrating Experiments and Continuous Deployment
Learn how to build an automated machine learning pipeline with Kubeflow 2.0. Step-by-step guide with code, experiment orchestration, versioning...
ML Deployment in Production: Docker, Kubernetes and the Real Cost of Scaling in 2026 (Step-by-Step Tutorial)
Practical tutorial on deploying machine learning models using Docker and Kubernetes, with cost analysis, scalability, and production monitoring for...