Raspberry Pi with connected camera displaying real-time object detection

Computer Vision with YOLOv8 on Edge: 2026 Tutorial

NeuralPulse|16 de junho de 2026|6 min read|Ler em Português

Training a neural network from scratch requires tens of thousands of labeled images and days of GPU time. For startups, researchers, or projects with tight budgets, this is a luxury.

Real-time object detection on edge devices — such as the Raspberry Pi, Jetson Nano, or smart cameras — is one of the biggest challenges in modern computer vision. Heavy models like YOLOv8x require powerful GPUs, but optimized versions like YOLOv8nano can run at 30 FPS on modest hardware.

In this tutorial, you will learn how to implement object detection with YOLOv8 on edge devices. We will cover everything from environment setup to optimization for real-time inference, including techniques like quantization and network pruning.

Why YOLOv8 Dominates Edge Device Detection

YOLO (You Only Look Once) is a family of object detection models that processes the entire image in a single pass through the neural network. This makes it extremely fast compared to region proposal-based approaches like Faster R-CNN.

Version 8, released by Ultralytics in 2023, introduced significant improvements: a CSPDarknet-based backbone, a PAN-FPN neck, and an anchor-free detection head. The result is a model that achieves 53.9% mAP on COCO with the large version (YOLOv8x) and 37.3% with the nano version (YOLOv8n), as documented in the official Ultralytics paper (2023).

For edge devices, YOLOv8n is the gold standard. With only 3.2 million parameters and 8.7 GFLOPs, it runs at 30 FPS on a Jetson Nano and 15 FPS on a Raspberry Pi 4 with OpenCV acceleration, according to Ultralytics community benchmarks.

The key lies in the efficient architecture: the backbone extracts features at multiple scales, the neck combines these features to detect objects of different sizes, and the head produces bounding boxes and classes without the need for predefined anchors.

Setting Up the Environment on the Edge Device

Before you begin, you need to prepare the device. We will use a Raspberry Pi 4 with 4GB of RAM and a USB camera. Install the dependencies:

sudo apt update
sudo apt install python3-pip libopencv-dev
pip3 install ultralytics opencv-python numpy

For Jetson Nano, use the NVIDIA JetPack SDK, which already includes CUDA and TensorRT. Install the optimized version of YOLOv8:

pip3 install ultralytics
sudo apt install nvidia-tensorrt

Now, load the pre-trained YOLOv8n model on COCO. The model parameter accepts the model name or the path to a .pt file.

from ultralytics import YOLO

model = YOLO('yolov8n.pt') print(model)

The output shows the complete architecture. Note that the model is already ready to detect 80 COCO classes (person, car, dog, etc.). For your specific task, you need fine-tuning with your own dataset.

Fine-Tuning YOLOv8 for Your Dataset

Suppose you want to detect traffic signs in real-time. You have a dataset with 500 labeled images in YOLO format (.txt files with class, x_center, y_center, width, height normalized).

Create a dataset.yaml file:

train: /path/dataset/train
val: /path/dataset/val
nc: 5
names: ['stop', 'go', 'caution', 'speed_limit', 'pedestrian']

Now, perform fine-tuning. Use the pre-trained model as a starting point:

model = YOLO('yolov8n.pt')
results = model.train(data='dataset.yaml', epochs=50, imgsz=640, batch=16, device='cpu')

For edge devices, use device='cpu' or device='cuda' if you have a GPU. Training may take a few hours on a Raspberry Pi, but you can do it on a more powerful computer and then transfer the model.

Optimization for Real-Time Inference

The trained model is about 6 MB. To run in real-time on the edge, you need to optimize. Three main techniques:

Quantization to INT8

Quantization reduces the precision of weights from FP32 to INT8, cutting the model size by 4x and accelerating inference by 2-3x. Use TensorRT on Jetson Nano:

from ultralytics import YOLO

model = YOLO('trained_model.pt') model.export(format='engine', device=0) # Export to TensorRT

On Raspberry Pi, use OpenCV with the ONNX backend:

ElevenLabs

Transforme texto em voz com IA realista. Perfeito para narracoes, podcasts e audiolivros.

Testar gratuito

model.export(format='onnx', imgsz=640)
# Then load with OpenCV
import cv2
net = cv2.dnn.readNet('trained_model.onnx')
net.setPreferableBackend(cv2.dnn.DNN_BACKEND_OPENCV)
net.setPreferableTarget(cv2.dnn.DNN_TARGET_CPU)

Network Pruning

Remove neurons with weights close to zero. Ultralytics offers native support:

from ultralytics import YOLO

model = YOLO('trained_model.pt') model.prune(0.3) # Remove 30% of the least important weights model.export(format='onnx')

Pruning reduces the model size by up to 50% with minimal loss in accuracy (less than 1% mAP).

Reducing Input Resolution

Decrease imgsz from 640 to 320. This reduces inference time by 4x, but also reduces accuracy for small objects. Test with your dataset:

results = model.predict(frame, imgsz=320)

The table below compares the strategies on a Raspberry Pi 4:

Strategy	Model Size	FPS	mAP (COCO)
Original (FP32)	6.2 MB	8 FPS	37.3%
Quantized (INT8)	1.6 MB	22 FPS	36.1%
Pruned (30%) + Quantized	1.1 MB	28 FPS	35.8%
Pruned + Quantized + imgsz=320	1.1 MB	35 FPS	32.4%

Implementing Real-Time Inference

With the optimized model, implement the inference loop. Use OpenCV to capture frames from the camera and YOLO to detect objects.

import cv2
from ultralytics import YOLO

model = YOLO('optimized_model.engine') # TensorRT cap = cv2.VideoCapture(0) # USB Camera

while True: ret, frame = cap.read() if not ret: break

results = model(frame, imgsz=640, conf=0.5)

# Draw bounding boxes
for r in results:
    for box in r.boxes:
        x1, y1, x2, y2 = map(int, box.xyxy[0])
        conf = box.conf[0].item()
        cls = int(box.cls[0].item())
        label = f'{model.names[cls]} {conf:.2f}'
        cv2.rectangle(frame, (x1, y1), (x2, y2), (0, 255, 0), 2)
        cv2.putText(frame, label, (x1, y1-10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)

cv2.imshow('Real-Time Detection', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
    break

cap.release() cv2.destroyAllWindows()

For devices with limited memory, process frames at reduced resolution and use a frame buffer to avoid FPS drops.

Performance Evaluation on the Edge Device

After implementation, evaluate performance in terms of FPS, latency, and accuracy. Use a test dataset with 100 images and measure:

import time

total_fps = 0 num_frames = 100

for i in range(num_frames): ret, frame = cap.read() if not ret: break

start = time.time()
results = model(frame)
end = time.time()

fps = 1 / (end - start)
total_fps += fps

print(f'Average FPS: {total_fps / num_frames:.2f}')

For accuracy, compare detections with ground truths using metrics like mAP and recall. Tools like ultralytics.yolo.utils.metrics can automate this analysis.

Conclusion

In this tutorial, you learned how to implement object detection with YOLOv8 on edge devices, from environment setup to optimization for real-time inference. Techniques like quantization, pruning, and resolution reduction allow efficient models to run on modest hardware, such as Raspberry Pi and Jetson Nano.

As next steps, explore fine-tuning with custom datasets for specific applications, such as surveillance or industrial automation. Keep up with Ultralytics updates for new YOLO versions and TensorRT integrations for even greater performance.

#yolov8#object-detection#edge-devices#raspberry-pi#jetson-nano#computer-vision

Raspberry Pi board connected to a camera, with neural network diagrams overlaid, representing image classification on the edge.

tutorials|6 min

Image Classification on the Edge: TF Lite 2026 Tutorial

Learn to train and deploy image classification models with TensorFlow Lite on edge devices like Raspberry Pi and smartphones. Practical tutorial co...

15 de junho de 2026Read more