Manual Conversion with OpenVino (RVC2 & RVC3)
Overview
RVC2 and RVC3 conversion is based on the OpenVINO toolkit. ModelConverter Docker Images provide all the necessary tools installed. Install the ModelConverter CLI as:Command Line
1pip install modelconv
Command Line
1modelconverter shell <platform>
platform
stands for the target platform you aim to convert for, so either rvc2
or rvc3
.This is equivalent to starting a new Docker container from the luxonis/modelconverter-<platform>:latest
image and running it as an interactive terminal session (-it
) with the --rm
flag to ensure the container is automatically removed once the session is exited:Command Line
1docker run --rm -it \
2 -v $(pwd)/shared_with_container:/app/shared_with_container/ \
3 luxonis/modelconverter-<platform>:latest
Command Line
1pip install openvino-dev==2022.3
Simplify model (Optional)
In order to obtain a model with optimal performance, we recommend running model simplification prior to the conversion steps. For an.onnx
model, you can run the simplification as:Command Line
1pip install onnxsim
2onnxsim <path to .onnx model> <path to simplified .onnx model>
Compile OpenVINO IR
The model is first converted from its original format to the OpenVINO Intermediate Representation (IR) format. It consists of two files encoding the network topology (.xml
file), and storing the model's weights and biases (.bin
file). OpenVINO Model Optimizer (v2022.3.0) is used for this conversion job and the following source model formats are supported:- ONNX
- TensorFlow
- PyTorch
- PaddlePaddle
- MXNet
- Kaldi
- Caffe
Command Line
1mo --input_model <path to the (un-)simplified source model> --compress_to_fp16
--compress_to_fp16
parameter to obtain optimal performance on our devices. You can find additional details here.Consult the
mo --help
for the full list of conversion options.If planning to use your model in Luxonis ecosystem, we propose you set the flags so that the model expects un-normalized BGR input.Therefore, be sure to set the --reverse_input_channels
, --mean_values
, and --scale_values
flags if the model expects RGB input or normalization.Quantize (RVC3 only)
If converting for RVC3, one must perform model quantization using OpenVINO Post-Training Optimization Toolkit (POT) prior to compiling to BLOB. See the following example for guidance.Compile BLOB
Once the model has been transformed into OpenVINO's IR format, the next step is to compile it for inference on MYRIAD device and convert it to the BLOB format. OpenVINO Compile Tool (v2022.3.0) is used for this job.To convert the model from IR to BLOB run:Command Line
1compile_tool -d MYRIAD -m <path to .xml model (make sure that .bin is at the same root)>
.../tools/compile_tool
directory of your OpenVINO installation. You can run it as:Command Line
1cd .../tools/compile_tool
2./compile_tool -d MYRIAD -m ...
Consult the
compile_tool -h
for the full list of conversion options.Advanced
Model Optimizer
Mean and Scale Values
The normalization of input images for the model is achieved through the--mean_values
and --scale_values
. By default, frames from Camera
node are in U8 data type, ranging from [0,255]
.However, models are typically trained with normalized frames within the range of [-1,1]
or [0,1]
. To ensure accurate inference results, frames need to be normalized beforehand.Although creating a custom model that normalizes frames before inference is an option (example here), it is more efficient to include this normalization directly within the model itself using the flags during model optimizer step.Here are some common normalization options (assuming that the initial input is in the range of [0,255]
):- For required input with values between 0 and 1, use mean=0 and scale=255, computed as
([0,255] - 0) / 255 = [0,1]
. - For required input with values between -1 and 1, use mean=127.5 and scale=127.5, computed as
([0,255] - 127.5) / 127.5 = [-1,1]
. - For required input with values between -0.5 and 0.5, use mean=127.5 and scale=255, computed as
([0,255] - 127.5) / 255 = [-0.5,0.5]
.
Model Layout
The model layout can be defined using the--layout
parameter. For example:Command Line
1--layout NCHW
- N - batch size
- C - channels
- H - height
- W - width
[NeuralNetwork(0)] [warning] Input image (416x416) does not match NN (3x416)
You have the option to switch between Interleaved / HWC and Planar / CHW layout through the API when requesting the output:Python
1import depthai as dai
2pipeline = dai.Pipeline()
3cam = pipeline.create(dai.node.Camera).build()
4output = cam.requestOutput(
5 size=SIZE, type=dai.ImgFrame.Type.BGR888i # or BGR888p (i stands for interleaved, and p stands for planar)
6)
Color Order
Neural network models are commonly trained using images in RGB color order. TheCamera
node, by default, outputs frames in BGR format. Mismatching the color order between input frames and the trained model can lead to inaccurate predictions. To address this, the --reverse_input_channels
flag is utilized.Moreover, there is an option to switch the camera output to RGB via the API, eliminating the need for the flag:Python
1import depthai as dai
2pipeline = dai.Pipeline()
3cam = pipeline.create(dai.node.Camera).build()
4output = cam.requestOutput(
5 size=SIZE, type=dai.ImgFrame.Type.RGB888p
6)
Compile Tool
Input Layer Precision
Using-ip U8
will incorporate a conversion layer U8->FP16 on all input layers of the model, which is typically the desired configuration. However, in specific scenarios, such as when working with data other than frames, using FP16 precision directly is necessary. In such cases, you can opt for -ip FP16