# Performance Optimization

This section is dedicated to optimizing and evaluating the performance of AI models when deployed on Luxonis devices, such as the
OAK-D series.

Optimizing AI model performance through quantization and hardware acceleration is key for powerful yet resource-constrained
Luxonis edge devices. This ensures responsive and scalable AI applications suitable for real-world use.

## Understanding Performance Metrics

Latency, throughput, and accuracy are three distinct metrics used to assess different aspects of performance in various systems.

 * Latency: refers to the delay or lag between the input (such as an image or video frame) and the corresponding output or
   response from a neural network.
 * Throughput: refers to the amount of data or the number of tasks a computer vision system can process within a given period.
 * Accuracy: measures how well a system or model performs in terms of correctness.

## Hardware Considerations

Currently, our devices are built on top of either the second or third generation of our Robotics Vision Core (RVC2 and RVC3). You
can find more specifics in hardware sections for [RVC2](https://docs.luxonis.com/hardware/platform/rvc/rvc2.md) and
[RVC3](https://docs.luxonis.com/hardware/platform/rvc/rvc3.md).

## RVC2 NN Performance

| Model name | Size | FPS | Latency [ms] |
| --- | --- | --- | --- |
| MobileOne S0 | 224x224 | 165.5 | 11.1 |
| Resnet18 | 224x224 | 94.8 | 19.7 |
| DeepLab V3 | 256x256 | 36.5 | 48.1 |
| DeepLab V3 | 513x513 | 6.3 | 253.1 |
| YoloV6n R2 | 416x416 | 65.5 | 29.3 |
| YoloV6n R2 | 640x640 | 29.3 | 66.4 |
| YoloV6t R2 | 416x416 | 35.8 | 54.1 |
| YoloV6t R2 | 640x640 | 14.2 | 133.6 |
| YoloV6m R2 | 416x416 | 8.6 | 190.2 |
| YoloV7t | 416x416 | 46.7 | 37.6 |
| YoloV7t | 640x640 | 17.8 | 97.0 |
| YoloV8n | 416x416 | 31.3 | 56.9 |
| YoloV8n | 640x640 | 14.3 | 123.6 |
| YoloV8s | 416x416 | 15.2 | 111.9 |
| YoloV8m | 416x416 | 6.0 | 273.8 |

> To see the performance of more RVC2 NN models, please refer to
> [this](https://docs.google.com/spreadsheets/d/1yMD4L3gNTkv9d-CqwHTYkn1_on9qDwjeDSUiW2x_H8k/edit?usp=sharing)
> spreadsheet.

## Model Optimization Techniques

### Luxonis-Specific Optimizations

#### Number of SHAVES

When exporting a given using, e.g., our tools, setting the correct number of SHAVES can increase the performance. The SHAVES are
vector processors in DepthAI/OAK. These SHAVES are used for operations that NCE (neural compute engines) aren't implemented to
handle, but also for other things in the device, like handling, reformatting images, doing some ISP, etc.

To read about setting a number of SHAVES in tools, please refer to
[here](https://docs.luxonis.com/software/ai-inference/integrations/yolo.md).

#### Lowering camera FPS to match NN FPS

Lowering FPS so that it does not exceed NN capabilities typically provides the best latency performance since the NN can start the
inference as soon as a new frame is available.

#### NN input queue size and blocking behavior

By default, queues in DepthAI are blocking, so when the queue size (for setting it, you can use the setQueueSize() method) is
reached, any additional messages from the device will be blocked. The library will wait until it can add new messages to the
queue. When the queues are non-blocking, in the previous scenario, the library will discard the oldest message, add the new one to
the queue, and then continue its processing loop. If your network has a higher latency and cannot process that many frames, it
might improve the performance by setting the input queue of your neural network to non-blocking.

### Quantization (only applicable for RVC3 conversion)

To speed the models up, one can quantize the models. Quantization refers to reducing the precision of the weights and activations
in the model from high precision (e.g., 32-bit floating-point numbers) to lower precision (e.g., 8-bit integers). The goal is to
achieve computational and memory efficiency without significantly compromising the model's performance. To learn more about
quantizing a model, please refer to OpenVino's
[documentation](https://docs.openvino.ai/2022.3/pot_default_quantization_usage.html).

To quantize a model one can use a [Post-Training Optimization Tool
(POT)](https://docs.openvino.ai/latest/pot_docs_FrequentlyAskedQuestions.html). We recommend default quantization for quick
testing. For this, you will need some images, e.g. coco128.zip ([download from here](https://ultralytics.com/assets/coco128.zip)).
However, we suggest using images from your training or validation set for the best quantization and lowest accuracy drop.

To install POT, use the following commands inside your Python environment:

```bash
python -m pip install --upgrade pip
pip install openvino-dev==2022.1
```

Next, define a pot-config.json file:

```json
{
    "model": {
        "model_name": "yolov6n",
        "model": "path/to/model.xml",
        "weights": "path/to/model.bin"
    },
    "engine": {
        "device": "CPU",
        "type": "simplified",
        "data_source" : "/path/to/coco128/images/train2017/"
    },
    "compression": {
        "target_device" : "VPU",
        "algorithms": [
            {
                "name": "DefaultQuantization",
                "params": {
                    "stat_subset_size": 300
                 },
            }
        ]
    }
}
```

Don't forget to specify a proper path to XML, BIN, and dataset.

Finally, call:

```bash
pot -c pot-config.json -d
```

This will create a directory with the results, storing your quantized XML and BIN.

## Profiling and Benchmarking

By enabling the info log level (or lower), depthai will print the usage of hardware resources, specifically CPU/RAM consumption,
temperature, CMX slices, and SHAVE core allocation.

You can set the debugging level like this:

```bash
DEPTHAI_LEVEL=info python3 script.py
```

### DepthAI Pipeline Graph

An excellent tool for debugging and providing insight into a DepthAI pipeline and its inner workings is [DepthAI Pipeline
Graph](https://github.com/luxonis/depthai_pipeline_graph). It is a tool that visualizes the DepthAI pipelines with FPS outputs.

To install it, run the following command:

```bash
pip install git+https://github.com/luxonis/depthai_pipeline_graph.git
```

Next, to visualize the DepthAI pipeline defined inside a main.py script, use this command:

```bash
pipeline_graph "python main.py -cam"
```

For more examples, please visit the [DepthAI Pipeline Graph repository](https://github.com/luxonis/depthai_pipeline_graph#run).

## Troubleshooting Common Performance Issues

If you set the depthai level to trace, depthai will log operation times for each node/process.

To visualize the network, you can use [netron.app](https://netron.app). You can investigate which operation could cause a
bottleneck and then test your hypothesis by pruning the model before this operation. To prune a model, you should set the --output
flag of the [model optimizer (mo)](https://docs.luxonis.com/software/ai-inference/conversion.md) to the given node. To read more
about pruning, please see the
[documentation](https://docs.openvino.ai/2022.3/openvino_docs_MO_DG_prepare_model_convert_model_Cutting_Model.html#model-cutting).
After pruning the model, compile it, measure its latency, and compare it with the latency of the original (not pruned) model to
see if your hypothesis was correct.
