# Manual Conversion with OpenVino (RVC2 & RVC3)

## Overview

RVC2 and RVC3 conversion is based on the OpenVINO toolkit. ModelConverter Docker Images provide all the necessary tools installed.
Install the ModelConverter CLI as:

```bash
pip install modelconv
```

and run:

```bash
modelconverter shell <platform>
```

Where platform stands for the target platform you aim to convert for, so either rvc2 or rvc3.

This is equivalent to starting a new Docker container from the luxonis/modelconverter-<platform>:latest image and running it as an
interactive terminal session (-it) with the --rm flag to ensure the container is automatically removed once the session is exited:

```bash
docker run --rm -it \
    -v $(pwd)/shared_with_container:/app/shared_with_container/ \
    luxonis/modelconverter-<platform>:latest
```

Alternatively, you can also install the OpenVino toolkit yourself by running:

```bash
pip install openvino-dev==2022.3
```

In the following sections, we explain the conversion process step-by-step.

## Simplify model (Optional)

In order to obtain a model with optimal performance, we recommend running model simplification prior to the conversion steps. For
an .onnx model, you can run the simplification as:

```bash
pip install onnxsim
onnxsim <path to .onnx model> <path to simplified .onnx model>
```

## Compile OpenVINO IR

The model is first converted from its original format to the OpenVINO Intermediate Representation (IR) format. It consists of two
files encoding the network topology (.xml file), and storing the model's weights and biases (.bin file). OpenVINO [Model Optimizer
(v2022.3.0)](https://docs.openvino.ai/2022.3/openvino_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide.html) is used for this
conversion job and the following source model formats are supported:

 * ONNX
 * TensorFlow
 * PyTorch
 * PaddlePaddle
 * MXNet
 * Kaldi
 * Caffe

To convert the model to IR run:

```bash
mo --input_model <path to the (un-)simplified source model> --compress_to_fp16
```

Note that it's advisable to use the --compress_to_fp16 parameter to obtain optimal performance on our devices. You can find
additional details [here](https://docs.openvino.ai/2022.3/openvino_docs_MO_DG_FP16_Compression.html).

> Consult the
> `mo --help`
> for the full list of conversion options. If planning to use your model in Luxonis ecosystem, we propose you set the flags so
that the model expects un-normalized BGR input. Therefore, be sure to set the
> `--reverse_input_channels`
> ,
> `--mean_values`
> , and
> `--scale_values`
> flags if the model expects RGB input or normalization.

## Quantize (RVC3 only)

If converting for RVC3, one must perform model quantization using [OpenVINO Post-Training Optimization Toolkit
(POT)](https://docs.openvino.ai/2022.3/pot_introduction.html) prior to compiling to BLOB. See the following
[example](https://docs.openvino.ai/2022.3/pot_configs_examples_README.html) for guidance.

## Compile BLOB

Once the model has been transformed into OpenVINO's IR format, the next step is to compile it for inference on MYRIAD device and
convert it to the BLOB format. [OpenVINO Compile Tool
(v2022.3.0)](https://docs.openvino.ai/2022.3/openvino_inference_engine_tools_compile_tool_README.html) is used for this job.

To convert the model from IR to BLOB run:

```bash
compile_tool -d MYRIAD -m <path to .xml model (make sure that .bin is at the same root)>
```

Note that the Compile Tool is part of the OpenVINO toolkit. If you have installed it manually, its location will depend on your
installation path. Typically, it's found in the .../tools/compile_tool directory of your OpenVINO installation. You can run it as:

```bash
cd .../tools/compile_tool
./compile_tool -d MYRIAD -m ...
```

> Consult the
> `compile_tool -h`
> for the full list of conversion options.

## Advanced

### Model Optimizer

#### Mean and Scale Values

The normalization of input images for the model is achieved through the --mean_values and --scale_values. By default, frames from
Camera node are in U8 data type, ranging from [0,255].

However, models are typically trained with normalized frames within the range of [-1,1] or [0,1]. To ensure accurate inference
results, frames need to be normalized beforehand.

Although creating a custom model that normalizes frames before inference is an option ([example
here](https://github.com/luxonis/oak-examples/tree/main/tutorials/custom-models/generate_model)), it is more efficient to include
this normalization directly within the model itself using the flags during model optimizer step.

Here are some common normalization options (assuming that the initial input is in the range of [0,255]):

 * For required input with values between 0 and 1, use mean=0 and scale=255, computed as ([0,255] - 0) / 255 = [0,1].
 * For required input with values between -1 and 1, use mean=127.5 and scale=127.5, computed as ([0,255] - 127.5) / 127.5 =
   [-1,1].
 * For required input with values between -0.5 and 0.5, use mean=127.5 and scale=255, computed as ([0,255] - 127.5) / 255 =
   [-0.5,0.5].

For more information, refer to [OpenVINO's
documentation](https://docs.openvino.ai/2022.3/openvino_docs_MO_DG_Additional_Optimization_Use_Cases.html#specifying-mean-and-scale-values).

#### Model Layout

The model layout can be defined using the --layout parameter. For example:

```bash
--layout NCHW
```

In following configuration:

 * N - batch size
 * C - channels
 * H - height
 * W - width

If the image layout does not match the model layout, DepthAI will display a corresponding error message: [NeuralNetwork(0)]
[warning] Input image (416x416) does not match NN (3x416)

You have the option to switch between Interleaved / HWC and Planar / CHW layout through the API when requesting the output:

```python
import depthai as dai
pipeline = dai.Pipeline()
cam = pipeline.create(dai.node.Camera).build()
output = cam.requestOutput(
    size=SIZE, type=dai.ImgFrame.Type.BGR888i # or BGR888p (i stands for interleaved, and p stands for planar)
)
```

You can find further details in [OpenVINO's
documentation](https://docs.openvino.ai/2022.3/openvino_docs_MO_DG_Additional_Optimization_Use_Cases.html#specifying-layout).

#### Color Order

Neural network models are commonly trained using images in RGB color order. The Camera node, by default, outputs frames in BGR
format. Mismatching the color order between input frames and the trained model can lead to inaccurate predictions. To address
this, the --reverse_input_channels flag is utilized.

Moreover, there is an option to switch the camera output to RGB via the API, eliminating the need for the flag:

```python
import depthai as dai
pipeline = dai.Pipeline()
cam = pipeline.create(dai.node.Camera).build()
output = cam.requestOutput(
    size=SIZE, type=dai.ImgFrame.Type.RGB888p
)
```

You can find further details in [OpenVINO's
documentation](https://docs.openvino.ai/2022.3/openvino_docs_MO_DG_Additional_Optimization_Use_Cases.html#reversing-input-channels).

### Compile Tool

#### Input Layer Precision

Using -ip U8 will incorporate a conversion layer U8->FP16 on all input layers of the model, which is typically the desired
configuration. However, in specific scenarios, such as when working with data other than frames, using FP16 precision directly is
necessary. In such cases, you can opt for -ip FP16

#### Shaves

Increasing the number of SHAVEs during compilation can enhance the model's speed, although the relationship between SHAVE cores
and performance is not linear. The firmware will provide a warning suggesting an optimal number of SHAVE cores, which is typically
half of the available cores.
