ON THIS PAGE

  • Inference
  • Overview
  • Installation
  • Inference Pipeline
  • Camera
  • Model and Parser(s)
  • Automatic Setup
  • Manual Setup
  • Queue(s)
  • Single-Headed
  • Multi-Headed
  • Results
  • Examples
  • Troubleshooting
  • Limit the Number of Model Shaves
  • Changing Parser Parameters
  • Different Visualization and Model Input Sizes
  • Model Location When Downloaded from HubAI
  • Further Reading
  • Concurrent Model Execution on RVC4
  • Resource allocation on HTP
  • Control knobs
  • Practical guidance

Inference

Overview

Models converted for RVC Platforms can be deployed on OAK devices to perform inference. The following section guides you through setting up a simple inference pipeline for a desired AI model. We utilize DepthAI to build the inference pipeline as a sequence of:The nodes of both kinds can be connected interchangably. The Built-in nodes are stable, optimized, and ensure efficient performance on Luxonis devices, while the Host nodes offer a greater flexibility and can be customized to meet specific use case. Please check out the DepthAI Nodes library for our in-house collection of Python host nodes.The inference pipeline can be defined manually, node-by-node. However, we also offer a certain degree of automation of pipeline creation based on the relevant NN Archive (for example, automatically connecting the neural network with a specific host node responsible for decoding its outputs). Please see below for more information.

Installation

Creation of an inference pipeline requires DepthAI (v3) library. Usage of our custom host nodes (e.g. for model output decoding) requires DepthAI Nodes library. You can install them using pip:
Command Line
1pip install --pre depthai --force-reinstall
2pip install depthai-nodes

Inference Pipeline

We present here a simple inference pipeline template. It consists of four main sections that we describe in more detail below:
  • Camera,
  • Model and Parser(s);
  • Queue(s);
  • Results.
Python
1import depthai as dai
2from depthai_nodes.node import ParsingNeuralNetwork
3
4model = "..." # NN Archive or HubAI model identifier
5
6# Create pipeline
7with dai.Pipeline() as pipeline:
8
9    # Camera
10    camera = pipeline.create(dai.node.Camera).build()
11
12    # Model and Parser(s)
13    nn_with_parser = pipeline.create(ParsingNeuralNetwork).build(
14        camera, model
15    )
16
17    # Queue(s)
18    parser_output_queue = nn_with_parser.out.createOutputQueue()
19
20    # Start pipeline
21    pipeline.start()
22
23    while pipeline.isRunning():
24
25        # Results
26        ...

Camera

The inference pipeline starts with the Camera node. It is the source of image frames that get inferenced on. The node can be added to a pipeline as follows:
Python
1camera_node = pipeline.create(dai.node.Camera).build()

Model and Parser(s)

Inference consists of two steps. First, the model makes predictions on the input data. Second, a postprocessing node, also known as a parser, is utilized to process the model output(s). This step is optional and the raw model output is returned if skipped. You can find more information about the available parsers in the DepthAI Nodes library.A model is set up using the NeuralNetwork node. A model-parser pair can be established:
  • Automatically, using the ParsingNeuralNetwork node wrapper; or
  • Manually, initializing them as independent nodes and linking them together.
The former automatically links the model outputs with the appropriate parsers as defined in the relevant NN Archive. This abstracts away all the configuration details and is thus the preffered way of interraction with parsers. The created nodes (independent or not) can be used the same way as standard DepthAI nodes by linking them to other nodes or define them as pipeline queues.

Automatic Setup

The ParsingNeuralNetwork node can be instantiated directly from:(1) NN Archive;
Python
1# Set up NN Archive
2nn_archive = dai.NNArchive(<path/to/NNArchiveName.tar.xz>)
3
4# Set up model (with parser(s)) and link it to camera output
5nn_with_parser = pipeline.create(ParsingNeuralNetwork).build(
6    cameraNode, nn_archive
7)
(2) HubAI, by specifying the model identifier (a unique identifier of a model on the HubAI platform; find more information at the Model Upload/Download section).
Python
1# Set up the HubAI model identifier
2model = "..."
3
4# Set up model with parser(s)
5nn_with_parser = pipeline.create(ParsingNeuralNetwork).build(
6    camera_node, model
7)
When initialized, the pipeline automatically detects the platform of the connected device and sets up both the model and the relevant parser(s). Moreover, it sets up the camera node and links it to the model input.

Manual Setup

The model and the parser can also be instantiated as independent nodes.First, import a DepthAI Nodes parser of interest or implement a parser of your own.
Python
1from depthai_nodes.node import <ParserNode>
2# OR:
3class ParserNode(dai.node.ThreadedHostNode):
4    def __init__(self) -> None:
5        super().__init__()
6        self.input = self.createInput()
7        self.out = self.createOutput()
8    def build(self) -> "ParserNode":
9        return self
10    def run(self) -> None:
11        nn_out_raw = self.input.get()
12        nn_out_processed = ... # custom post-processing
13        self.out.send(nn_out_processed)
Second, initialize the model and the parser as individual nodes by calling the create() method on the pipeline:
Python
1model = pipeline.create(dai.node.NeuralNetwork)
2parser = pipeline.create(<ParserNode>)
The nodes are initialized using the default parameters and can be further configured according to your needs either:
  • at initialization, configuration can be set by passing the parameter values as arguments to the create() method: parser = pipeline.create(<ParserNode>, <ParameterName>=<ParameterValue>, ...) If configuring multiple parameters, you can arrange them into a dict and pass it as an argument to the parser build() method: parser = pipeline.create(<ParserNode>).build(config_dict);
  • after initialization, one can change configuration by using the setter methods: parser.<SetterMethodName>(<ParameterValue>). You can find all the setter methods available for a specific parser on the DepthAI Nodes API Reference page.
Third, set the model executable (i.e. the .blob for RVC2, or the .dlc file for RVC4):
Python
1model.setModelPath(<path/to/model_executable>)
Last, prepare the camera stream and link the idependent nodes to constitute a pipeline:
Python
1width, height = ... # model input size
2camera_stream = camera.requestOutput(size=(width, height))
3camera_stream.link(model.input)
4model.out.link(parser.input)

Queue(s)

Queues are used to obtain data from specific nodes of the pipeline. To obtain the image frame that gets input to the model, you can use the passthrough queue:
Python
1frame_queue = nn_with_parser.passthrough.createOutputQueue()
To obtain the (parsed) model output, you can use the output queue(s). The definition depends on the number of model heads:

Single-Headed

Python
1parser_output_queue = nn_with_parser.out.createOutputQueue()

Multi-Headed

Python
1head0_parser_output_queue = nn_with_parser.getOutput(0).createOutputQueue()
2head1_parser_output_queue = nn_with_parser.getOutput(1).createOutputQueue()
3...

Results

After the pipeline is started with pipeline.start(), outputs can be obtained from the defined queue(s). You can obtain the input frame and parsed model outputs as:
Python
1while pipeline.isRunning():
2
3    # Get Camera Output
4    frame_queue_output = frame_queue.get()
5    frame = frame_queue_output.getCvFrame()
6    ...
7
8    # Get Parsed Output(s)
9    parser_output = parser_output_queue.get()
10    ...
The parsed model outputs are returned as:Please read the DepthAI Nodes API reference to learn more about the relevant formats and how to utilize them for your use case.

Examples

Please consult the OAK Examples page.

Troubleshooting

Below are some common issues and their solutions.

Limit the Number of Model Shaves

Some models are compiled for more shaves than the device supports. This often occurs on older devices like the OAK-D Lite. If the number of shaves in the compiled model exceeds what the device supports, the pipeline will fail with a RuntimeError similar to:
Command Line
1NeuralNetwork: Blob compiled for ... shaves, but only ... are available in current configuration
To fix this, you can either:
  • Re-export the model with a matching number of shaves when using .blob format.
  • Limit the shaves in code to match the device’s available shaves when using the .superblob format with NNArchive.
The latter can be done as follows:
Python
1nn_archive = dai.NNArchive(...)
2nn_with_parser = pipeline.create(ParsingNeuralNetwork).build(
3    ..., nn_archive
4)
5# Limit the number of shaves to match the device
6nn_with_parser.setNNArchive(
7    nn_archive, numShaves=<Number>
8)

Changing Parser Parameters

To modify parser parameters, you first need to access the parser object.
  • Pipelines with a separate parser node: Simply access the parser node directly.
  • Pipelines with a ParsingNeuralNetwork node: In this case, the parser is integrated with the AI model. Retrieve it by calling the .getParser() method on the ParsingNeuralNetwork node.
Once you have the parser, update its parameters using the relevant set methods. Example:
Python
1parser.setConfThreshold(0.5)

Different Visualization and Model Input Sizes

You can use separate image sizes for model input and for visualization. Use the ImageManip node to resize the image before sending it to the model, while keeping the original resolution for display.Example:
Python
1cam = pipeline.create(dai.node.Camera).build()
2
3# Request specific image size for capture
4cam_out = cam.requestOutput(size=(<width1>, <height1>))
5
6# Create and configure resize node for model input
7resize_node = pipeline.create(dai.node.ImageManip)
8resize_node.initialConfig.setOutputSize(<width2>, <height2>)
9cam_out.link(resize_node.inputImage)
10
11# Define model with resized input
12nn_with_parser: ParsingNeuralNetwork = pipeline.create(ParsingNeuralNetwork).build(
13    resize_node.out, ...
14)
15
16# Visualize Using Original Resolution
17video_queue = cam_out.out.createOutputQueue() # high-res stream
18detection_queue = nn_with_parser.out.createOutputQueue() # detections on the low-res stream
19...

Model Location When Downloaded from HubAI

When you download a model from HubAI, it’s stored in the .depthai_cached_models folder at the project root. This cache contains all models from previous runs. If a model is already cached, it’s loaded locally instead of re-downloading. To force a fresh download, can use the useCached=False parameter when downloading the model. Example:
Python
1nn_archive = dai.NNArchive(dai.getModelFromZoo(model_description, useCached=False))
2nn_with_parser = pipeline.create(ParsingNeuralNetwork).build(
3    ..., nn_archive
4)
Alternatively, you can delete the .depthai_cached_models folder and re-run the pipeline.

Further Reading

The following section provide additional information and insights into the inference process on RVC4 platform and how NN models are executed on Qualcomm's Hexagon Tensor Processor (HTP).

Concurrent Model Execution on RVC4

When running multiple models concurrently on the Hexagon Tensor Processor (HTP) of a Qualcomm SoC, there are some important considerations to keep in mind regarding resource allocation and scheduling. The HTP shares compute and on-chip memory resources dynamically. Based on Qualcomm, there is no direct way to prioritise one model over another on the HTP and this is not user-tunable. You should treat the HTP as a black-box scheduler and pick the threading strategy that empirically works best.

Resource allocation on HTP

  • HTP compute cores and V-TCM memory are shared elastically across all concurrent SNPE sessions.
  • The internal scheduler uses a round-robin strategy; the split of the resources is not fixed between multiple models and can vary frame-to-frame.

Control knobs

  • Resource steering is not supported. There is no per-model priority, core-affinity or quota API. The only global lever is the --perf_profile flag which affects power/perf trade-offs at the SoC level.

Practical guidance

  1. Assume latency and throughput will fluctuate when new/concurrent sessions start or stop.
  2. Use SNPE timing logs such as layer-wise profiling, to measure end-to-end latency instead of guessing resource shares.