Computer Vision with YOLOv8 on Edge: 2026 Tutorial
Training a neural network from scratch requires tens of thousands of labeled images and days of GPU time. For startups, researchers, or projects with tight budgets, this is a luxury.
Real-time object detection on edge devices — such as the Raspberry Pi, Jetson Nano, or smart cameras — is one of the biggest challenges in modern computer vision. Heavy models like YOLOv8x require powerful GPUs, but optimized versions like YOLOv8nano can run at 30 FPS on modest hardware.
In this tutorial, you will learn how to implement object detection with YOLOv8 on edge devices. We will cover everything from environment setup to optimization for real-time inference, including techniques like quantization and network pruning.
Why YOLOv8 Dominates Edge Device Detection
YOLO (You Only Look Once) is a family of object detection models that processes the entire image in a single pass through the neural network. This makes it extremely fast compared to region proposal-based approaches like Faster R-CNN.
Version 8, released by Ultralytics in 2023, introduced significant improvements: a CSPDarknet-based backbone, a PAN-FPN neck, and an anchor-free detection head. The result is a model that achieves 53.9% mAP on COCO with the large version (YOLOv8x) and 37.3% with the nano version (YOLOv8n), as documented in the official Ultralytics paper (2023).
For edge devices, YOLOv8n is the gold standard. With only 3.2 million parameters and 8.7 GFLOPs, it runs at 30 FPS on a Jetson Nano and 15 FPS on a Raspberry Pi 4 with OpenCV acceleration, according to Ultralytics community benchmarks.
The key lies in the efficient architecture: the backbone extracts features at multiple scales, the neck combines these features to detect objects of different sizes, and the head produces bounding boxes and classes without the need for predefined anchors.
Setting Up the Environment on the Edge Device
Before you begin, you need to prepare the device. We will use a Raspberry Pi 4 with 4GB of RAM and a USB camera. Install the dependencies:
sudo apt update
sudo apt install python3-pip libopencv-dev
pip3 install ultralytics opencv-python numpy
For Jetson Nano, use the NVIDIA JetPack SDK, which already includes CUDA and TensorRT. Install the optimized version of YOLOv8:
pip3 install ultralytics
sudo apt install nvidia-tensorrt
Now, load the pre-trained YOLOv8n model on COCO. The model parameter accepts the model name or the path to a .pt file.
from ultralytics import YOLO
model = YOLO('yolov8n.pt') print(model)
The output shows the complete architecture. Note that the model is already ready to detect 80 COCO classes (person, car, dog, etc.). For your specific task, you need fine-tuning with your own dataset.
Fine-Tuning YOLOv8 for Your Dataset
Suppose you want to detect traffic signs in real-time. You have a dataset with 500 labeled images in YOLO format (.txt files with class, x_center, y_center, width, height normalized).
Create a dataset.yaml file:
train: /path/dataset/train
val: /path/dataset/val
nc: 5
names: ['stop', 'go', 'caution', 'speed_limit', 'pedestrian']
Now, perform fine-tuning. Use the pre-trained model as a starting point:
model = YOLO('yolov8n.pt')
results = model.train(data='dataset.yaml', epochs=50, imgsz=640, batch=16, device='cpu')
For edge devices, use device='cpu' or device='cuda' if you have a GPU. Training may take a few hours on a Raspberry Pi, but you can do it on a more powerful computer and then transfer the model.
Optimization for Real-Time Inference
The trained model is about 6 MB. To run in real-time on the edge, you need to optimize. Three main techniques:
Quantization to INT8
Quantization reduces the precision of weights from FP32 to INT8, cutting the model size by 4x and accelerating inference by 2-3x. Use TensorRT on Jetson Nano:
from ultralytics import YOLO
model = YOLO('trained_model.pt') model.export(format='engine', device=0) # Export to TensorRT
On Raspberry Pi, use OpenCV with the ONNX backend:
model.export(format='onnx', imgsz=640)
# Then load with OpenCV
import cv2
net = cv2.dnn.readNet('trained_model.onnx')
net.setPreferableBackend(cv2.dnn.DNN_BACKEND_OPENCV)
net.setPreferableTarget(cv2.dnn.DNN_TARGET_CPU)
Network Pruning
Remove neurons with weights close to zero. Ultralytics offers native support:
from ultralytics import YOLO
model = YOLO('trained_model.pt') model.prune(0.3) # Remove 30% of the least important weights model.export(format='onnx')
Pruning reduces the model size by up to 50% with minimal loss in accuracy (less than 1% mAP).
Reducing Input Resolution
Decrease imgsz from 640 to 320. This reduces inference time by 4x, but also reduces accuracy for small objects. Test with your dataset:
results = model.predict(frame, imgsz=320)
The table below compares the strategies on a Raspberry Pi 4:
| Strategy | Model Size | FPS | mAP (COCO) |
|---|---|---|---|
| Original (FP32) | 6.2 MB | 8 FPS | 37.3% |
| Quantized (INT8) | 1.6 MB | 22 FPS | 36.1% |
| Pruned (30%) + Quantized | 1.1 MB | 28 FPS | 35.8% |
| Pruned + Quantized + imgsz=320 | 1.1 MB | 35 FPS | 32.4% |
Implementing Real-Time Inference
With the optimized model, implement the inference loop. Use OpenCV to capture frames from the camera and YOLO to detect objects.
import cv2
from ultralytics import YOLO
model = YOLO('optimized_model.engine') # TensorRT cap = cv2.VideoCapture(0) # USB Camera
while True: ret, frame = cap.read() if not ret: break
results = model(frame, imgsz=640, conf=0.5)
# Draw bounding boxes
for r in results:
for box in r.boxes:
x1, y1, x2, y2 = map(int, box.xyxy[0])
conf = box.conf[0].item()
cls = int(box.cls[0].item())
label = f'{model.names[cls]} {conf:.2f}'
cv2.rectangle(frame, (x1, y1), (x2, y2), (0, 255, 0), 2)
cv2.putText(frame, label, (x1, y1-10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
cv2.imshow('Real-Time Detection', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release() cv2.destroyAllWindows()
For devices with limited memory, process frames at reduced resolution and use a frame buffer to avoid FPS drops.
Performance Evaluation on the Edge Device
After implementation, evaluate performance in terms of FPS, latency, and accuracy. Use a test dataset with 100 images and measure:
import time
total_fps = 0 num_frames = 100
for i in range(num_frames): ret, frame = cap.read() if not ret: break
start = time.time()
results = model(frame)
end = time.time()
fps = 1 / (end - start)
total_fps += fps
print(f'Average FPS: {total_fps / num_frames:.2f}')
For accuracy, compare detections with ground truths using metrics like mAP and recall. Tools like ultralytics.yolo.utils.metrics can automate this analysis.
Conclusion
In this tutorial, you learned how to implement object detection with YOLOv8 on edge devices, from environment setup to optimization for real-time inference. Techniques like quantization, pruning, and resolution reduction allow efficient models to run on modest hardware, such as Raspberry Pi and Jetson Nano.
As next steps, explore fine-tuning with custom datasets for specific applications, such as surveillance or industrial automation. Keep up with Ultralytics updates for new YOLO versions and TensorRT integrations for even greater performance.
Related Articles
Related Articles
Image Classification on the Edge: TF Lite 2026 Tutorial
Learn to train and deploy image classification models with TensorFlow Lite on edge devices like Raspberry Pi and smartphones. Practical tutorial co...
How to Implement a Real-Time Pest Detection System with Computer Vision
Practical guide to building a pest monitoring system using low-cost cameras and deep learning models, with code examples and data...
AI at the 2026 Olympic Games: How Brazilian Athletes Use Machine Learning to Break Records
With a R$12 million investment from the COB and Intel's computer vision tools, Brazilian Olympic athletes are using AI to optimize training,...