Performance Optimization
This section is dedicated to optimizing and evaluating the performance of AI models when deployed on Luxonis devices, such as the OAK-D series.Optimizing AI model performance through quantization and hardware acceleration is key for powerful yet resource-constrained Luxonis edge devices. This ensures responsive and scalable AI applications suitable for real-world use.Understanding Performance Metrics
Latency, throughput, and accuracy are three distinct metrics used to assess different aspects of performance in various systems.- Latency: refers to the delay or lag between the input (such as an image or video frame) and the corresponding output or response from a neural network.
- Throughput: refers to the amount of data or the number of tasks a computer vision system can process within a given period.
- Accuracy: measures how well a system or model performs in terms of correctness.
Hardware Considerations
Currently, our devices are built on top of either the second or third generation of our Robotics Vision Core (RVC2 and RVC3). You can find more specifics in hardware sections for RVC2 and RVC3.RVC2 NN Performance
Model name | Size | FPS | Latency [ms] |
---|---|---|---|
MobileOne S0 | 224x224 | 165.5 | 11.1 |
Resnet18 | 224x224 | 94.8 | 19.7 |
DeepLab V3 | 256x256 | 36.5 | 48.1 |
DeepLab V3 | 513x513 | 6.3 | 253.1 |
YoloV6n R2 | 416x416 | 65.5 | 29.3 |
YoloV6n R2 | 640x640 | 29.3 | 66.4 |
YoloV6t R2 | 416x416 | 35.8 | 54.1 |
YoloV6t R2 | 640x640 | 14.2 | 133.6 |
YoloV6m R2 | 416x416 | 8.6 | 190.2 |
YoloV7t | 416x416 | 46.7 | 37.6 |
YoloV7t | 640x640 | 17.8 | 97.0 |
YoloV8n | 416x416 | 31.3 | 56.9 |
YoloV8n | 640x640 | 14.3 | 123.6 |
YoloV8s | 416x416 | 15.2 | 111.9 |
YoloV8m | 416x416 | 6.0 | 273.8 |
To see the performance of more RVC2 NN models, please refer to this spreadsheet.
Model Optimization Techniques
Luxonis-Specific Optimizations
Number of SHAVES
When exporting a given using, e.g., our tools, setting the correct number of SHAVES can increase the performance. The SHAVES are vector processors in DepthAI/OAK. These SHAVES are used for operations that NCE (neural compute engines) aren't implemented to handle, but also for other things in the device, like handling, reformatting images, doing some ISP, etc.To read about setting a number of SHAVES in tools, please refer to here.Lowering camera FPS to match NN FPS
Lowering FPS so that it does not exceed NN capabilities typically provides the best latency performance since the NN can start the inference as soon as a new frame is available.NN input queue size and blocking behavior
By default, queues in DepthAI are blocking, so when the queue size (for setting it, you can use thesetQueueSize()
method) is reached, any additional messages from the device will be blocked. The library will wait until it can add new messages to the queue. When the queues are non-blocking, in the previous scenario, the library will discard the oldest message, add the new one to the queue, and then continue its processing loop. If your network has a higher latency and cannot process that many frames, it might improve the performance by setting the input queue of your neural network to non-blocking.Quantization (only applicable for RVC3 conversion)
To speed the models up, one can quantize the models. Quantization refers to reducing the precision of the weights and activations in the model from high precision (e.g., 32-bit floating-point numbers) to lower precision (e.g., 8-bit integers). The goal is to achieve computational and memory efficiency without significantly compromising the model's performance. To learn more about quantizing a model, please refer to OpenVino's documentation.To quantize a model one can use a Post-Training Optimization Tool (POT). We recommend default quantization for quick testing. For this, you will need some images, e.g.coco128.zip
(download from here). However, we suggest using images from your training or validation set for the best quantization and lowest accuracy drop.To install POT, use the following commands inside your Python environment:Command Line
1python -m pip install --upgrade pip
2pip install openvino-dev==2022.1
pot-config.json
file:JSON
1{
2 "model": {
3 "model_name": "yolov6n",
4 "model": "path/to/model.xml",
5 "weights": "path/to/model.bin"
6 },
7 "engine": {
8 "device": "CPU",
9 "type": "simplified",
10 "data_source" : "/path/to/coco128/images/train2017/"
11 },
12 "compression": {
13 "target_device" : "VPU",
14 "algorithms": [
15 {
16 "name": "DefaultQuantization",
17 "params": {
18 "stat_subset_size": 300
19 },
20 }
21 ]
22 }
23}
Command Line
1pot -c pot-config.json -d
Profiling and Benchmarking
By enabling the infolog
level (or lower), depthai will print the usage of hardware resources, specifically CPU/RAM consumption, temperature, CMX slices, and SHAVE core allocation.You can set the debugging level like this:Command Line
1DEPTHAI_LEVEL=info python3 script.py
DepthAI Pipeline Graph
An excellent tool for debugging and providing insight into a DepthAI pipeline and its inner workings is DepthAI Pipeline Graph. It is a tool that visualizes the DepthAI pipelines with FPS outputs.To install it, run the following command:Command Line
1pip install git+https://github.com/luxonis/depthai_pipeline_graph.git
main.py
script, use this command:Command Line
1pipeline_graph "python main.py -cam"
Troubleshooting Common Performance Issues
If you set the depthai level totrace
, depthai will log operation times for each node/process.To visualize the network, you can use netron.app. You can investigate which operation could cause a bottleneck and then test your hypothesis by pruning the model before this operation. To prune a model, you should set the --output
flag of the model optimizer (mo) to the given node. To read more about pruning, please see the documentation. After pruning the model, compile it, measure its latency, and compare it with the latency of the original (not pruned) model to see if your hypothesis was correct.