YOLO Models for Real-Time Object Detection on Luxonis DepthAI

Introduction to YOLO Models

YOLO (You Only Look Once) is a family of real-time object detection models known for their speed and accuracy. Unlike traditional object detection methods that apply a model to an image at multiple locations and scales, YOLO models frame object detection as a regression problem. They predict bounding boxes and class probabilities directly from full images in a single evaluation, enabling fast and efficient object detection suitable for real-time applications. Initial YOLO models were used primarily for object detection, while newer versions support multiple heads for tasks like keypoints detection, segmentation, and more.

Getting Started with YOLO on DepthAI

YOLO Integration Overview

DepthAI supports parsing YOLO model outputs (including post-processing like Non-Maximum Suppression) and converting them into the standard DepthAI message format (ImgDetections). This enables efficient YOLO model integration and processing on DepthAI devices.There are two main nodes to use with YOLO models:

YoloDetectionNetwork: Standard object detection using YOLO models.
YoloSpatialDetectionNetwork: Combines object detection with spatial data (i.e., depth information), allowing 3D object localization.

Example Implementations

To help you get started, explore the following example implementations:

RGB & Tiny YOLO: Demonstrates how to use the Tiny YOLO model for object detection.
RGB & Tiny YOLO with Spatial Data: Shows how to perform object detection with depth information using Spatial Tiny YOLO.
RGB & YOLOv8 Nano: Illustrates the use of the lightweight YOLOv8 Nano model for high-performance detection in resource-constrained environments.

YOLO Experiments with DepthAI

DepthAI supports various YOLO models for object detection using both on-device and on-host decoding methods. You can find several demos and examples in the OAK Examples repository, which include:

device-decoding: General object detection using YOLOv3, YOLOv3-tiny, YOLOv4, YOLOv4-tiny, and YOLOv5 with on-device decoding. Uses the DepthAI-API.
car-detection: Car detection using YOLOv3-tiny and YOLOv4-tiny models with on-device decoding. Uses the DepthAI-SDK.
host-decoding: Object detection using YOLOv5 with on-host decoding.
yolox: Object detection without anchors using YOLOX-tiny with on-host decoding.
yolop: Vehicle detection, road segmentation, and lane segmentation using YOLOP on OAK with on-host decoding.

These examples showcase how to run different YOLO models on DepthAI devices with both on-device and on-host decoding.

Training and Customization

If you wish to train or fine-tune YOLO models for your specific needs, the following resources will guide you through the process:

Training Guide: Provides step-by-step instructions for training YOLO models using your dataset.
Model Zoo: Access a collection of pre-trained YOLO models that you can use directly or as a starting point for further training.

Model Conversion with tools.luxonis.com

Luxonis provides a powerful toolset at tools.luxonis.com that allows you to easily convert your trained YOLO models into formats compatible with DepthAI. This tool is particularly useful for converting YOLO models trained in PyTorch (.pt files) into the OpenVINO format, which can then be converted to a DepthAI .blob file.

Licenses

Each YOLO model version integrated into DepthAI may have its own licensing terms. Please review the respective licenses for the models you are using:

YOLOv3: Released under the YOLOv3 License.
YOLOv4: Released under the YOLOv4 License.
YOLOv5: Released by Ultralytics under the YOLOv5 License (GNU Affero General Public License v3.0).
YOLOv6: Released under the YOLOv6 License (GNU General Public License v3.0).
YOLOv7: Released under the YOLOv7 License (GNU General Public License v3.0).
YOLOv8: Released by Ultralytics under the YOLOv8 License (GNU Affero General Public License v3.0).
YOLOX: Released under the YOLOX License (Apache License v2.0).
YOLOP: Released under the YOLOP License (MIT License).
GoldYOLO: Released under the GoldYOLO License (GNU General Public License v3.0).

ON THIS PAGE