Machine learning engineer analyzing drift monitoring dashboard
machine-learning

MLOps in Practice: ML Pipeline with CI/CD, Drift and Versioning in 2026

NeuralPulse|13 de junho de 2026|7 min read|Ler em Português

Have you ever put a model into production, only to watch its accuracy plummet a week later without warning? That's the nightmare for 62% of companies scaling machine learning that haven't yet automated retraining (MLOps Survey 2026). Meanwhile, 78% of organizations using MLOps report more predictable deliveries and fewer incidents.

The problem isn't a lack of tools — it's the absence of an integrated pipeline. In this tutorial, you'll build a complete MLOps flow from scratch using only open-source tools: MLflow for experiment tracking, DVC for dataset versioning, GitHub Actions for CI/CD, and Evidently AI for drift monitoring. The focus is practical: every step includes executable code and YAML configuration.

The real bottleneck in ML isn't building models, it's keeping them alive in production. A well-designed MLOps pipeline transforms deployment from a stressful event into a predictable and reversible process.

1. Data Versioning with DVC: The Pillar Everyone Ignores

Without data versioning, your ML pipeline is a black box. You train the model today, it works. Three months later, someone updates the dataset without telling you — and the model breaks. DVC (Data Version Control) solves this by treating datasets as versionable objects linked to Git.

Step 1: Initialize DVC in your repository

pip install dvc
git init
dvc init

This creates a .dvc/ directory and a configuration file. Nothing changes in Git yet.

Step 2: Add a dataset to version control

dvc add data/train.csv
git add data/train.csv.dvc .gitignore
git commit -m "Add training dataset v1.0"

DVC doesn't store the large file in Git. It creates a lightweight .dvc file pointing to the file's hash. The actual data goes to a local or remote cache (S3, GCS, SFTP server).

Step 3: Configure a remote (example: S3 bucket)

dvc remote add -d myremote s3://my-ml-bucket/dvc-cache
dvc push

Now anyone on the team can download the exact same dataset version with dvc pull. This is essential for reproducibility. If the dataset changes, DVC detects the difference by the hash — and you can compare versions.

Table 1: Quick comparison of versioning tools

ToolMain FocusStorageGit Integration
DVCData and modelsLocal + remote cacheFull (via .dvc)
Git LFSLarge filesLFS serverPartial (replaces files)
MLflowExperiments and modelsTracking serverPartial (via URI)
Hugging Face HubModels and datasetsRemote HubIndependent

Practical tip: Always run dvc checkout after a git checkout to ensure local data matches the commit. Automate this with a Git hook.

2. CI/CD for ML: GitHub Actions + MLflow + Model Tests

CI/CD for ML isn't the same as for regular software. Besides testing code, you need to validate data, model metrics, and detect information leakage. MLflow comes in here as the brain of experiments, storing parameters, metrics, and artifacts.

Repository structure:

my-ml-project/
├── .github/workflows/ci-ml.yml
├── data/
├── notebooks/
├── src/
│   ├── train.py
│   ├── evaluate.py
│   └── data_validation.py
├── models/
├── requirements.txt
├── DVC
└── README.md

ci-ml.yml file (GitHub Actions):

name: ML Pipeline CI/CD

on: push: branches: [main] pull_request: branches: [main]

jobs: train-and-evaluate: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: actions/setup-python@v5 with: python-version: '3.11' - name: Install dependencies run: pip install -r requirements.txt - name: Download versioned data run: | dvc pull dvc checkout - name: Run data validation run: python src/data_validation.py - name: Train model run: | python src/train.py --experiment-name "ci-$(date +%Y%m%d-%H%M%S)" - name: Evaluate model run: python src/evaluate.py --threshold 0.75 - name: Register model in MLflow run: | mlflow models register -m runs:/<run-id>/model -n my_production_model

Highlight for data_validation.py: it checks if the dataset has no null values in critical columns, if the feature distribution is within expected limits, and if the number of rows is coherent. Without this validation, a corrupted dataset could slip through and generate a useless model.

Editorial blockquote (no attribution):

Training a model with dirty data is like cooking with spoiled ingredients: you only discover the mistake after the food is on the table — and the customer has already complained.

Example train.py with MLflow:

import mlflow
import mlflow.sklearn
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

mlflow.set_tracking_uri("http://localhost:5000")

with mlflow.start_run(): # Parameters n_estimators = 100 max_depth = 10 mlflow.log_param("n_estimators", n_estimators) mlflow.log_param("max_depth", max_depth)

# Training
model = RandomForestClassifier(n_estimators=n_estimators, max_depth=max_depth)
model.fit(X_train, y_train)
# Metrics
acc = model.score(X_test, y_test)
mlflow.log_metric("accuracy", acc)
# Log model
mlflow.sklearn.log_model(model, "model")

The CI/CD will run this script on every push to main. If accuracy drops below 0.75, the pipeline fails — and the model doesn't go to production. This is what separates responsible deployment from a "cross your fingers and pray" approach.

3. Drift Monitoring with Evidently AI: When the Model Starts to Fail

Models in production are victims of time. Data changes, concepts evolve, and what worked in January can be disastrous in June. Evidently AI is the standard open-source tool for detecting data drift and concept drift.

Step 1: Install Evidently

pip install evidently

Step 2: Create a monitoring dashboard

import pandas as pd
from evidently.report import Report
from evidently.metric_preset import DataDriftPreset, RegressionPreset

Reference data (original training)

reference = pd.read_csv("data/train.csv")

Current data (last week of production)

current = pd.read_csv("data/production_latest.csv")

report = Report(metrics=[ DataDriftPreset(), RegressionPreset() ]) report.run(reference_data=reference, current_data=current) report.save_html("monitoring_report.html")

This report shows, for each feature, whether there was statistical drift (Kolmogorov-Smirnov or chi-squared test) and its magnitude. If more than 20% of features show drift, the alarm should trigger.

Step 3: Automate the alert with GitHub Actions (scheduled)

You can run monitoring as a weekly job in GitHub Actions:

name: Weekly Drift Monitoring

on: schedule: - cron: '0 6 * * 1' # every Monday 6 AM UTC

jobs: check-drift: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Install dependencies run: pip install -r requirements.txt - name: Run monitoring run: python src/monitor_drift.py - name: Send alert if high drift run: | if grep -q "drift_detected: true" drift_report.json; then curl -X POST -H "Content-Type: application/json"
-d '{"text":"Drift detected! Initiate retraining."}'
$SLACK_WEBHOOK_URL fi

Integration with automatic retraining: When drift exceeds the threshold, the pipeline can trigger a new training via CI/CD. MLflow versions the new model, and DVC ensures the correct dataset is used. Everything orchestrated.

Kubeflow for heavy orchestration: If your company operates multiple models at scale, Kubeflow offers native Kubernetes pipelines. It manages dependencies, parallelism, and computational resources. The CNCF reports that tools like MLflow, DVC, and Kubeflow reduce model deployment time by 40% (CNCF 2026). This means a process that took two weeks now takes six business days.

Final checklist for your MLOps pipeline in 2026:

  • Data versioning with DVC (atomic commits, remote cache)
  • Experiment tracking with MLflow (parameters, metrics, artifacts)
  • CI/CD with GitHub Actions (data validation + training + evaluation)
  • Model tests (minimum performance threshold)
  • Automatic approved model registration (MLflow Model Registry)
  • Drift monitoring with Evidently AI (weekly report + alert)
  • Automatic retraining trigger (when drift > 20% or performance drops)

Conclusion

MLOps isn't a luxury for tech companies — it's the minimum barrier for any team wanting to keep models in production without surprises. This tutorial showed the skeleton of a functional pipeline with DVC, MLflow, GitHub Actions, and Evidently AI. The next step is to adapt it to your reality: choose your data remote, define the drift thresholds that make sense for your domain, and automate retraining. The cost of ignoring MLOps is predictable: models that die silently, wrong decisions, and rework. The cost of implementing it is a weekend of setup and a culture of technical discipline. The choice is yours.

Related Articles

#mlops#ci-cd-for-ml#drift-monitoring#dataset-versioning#mlflow#dvc#kubeflow#automated-pipeline
Compartilhar: