Inference
Overview
Models converted for RVC Platforms can be deployed on OAK devices to perform inference. The following section guides you through setting up a simple inference pipeline for a desired AI model. We utilize DepthAI to build the inference pipeline as a sequence of:- Built-in nodes (run directly on Luxonis devices), and
- Host nodes (run on the host).
If the model of choice is not converted for a desired RVC platform, please refer to the Conversion section.
Installation
Creation of an inference pipeline requires DepthAI (v3) library. Usage of our custom host nodes (e.g. for model output decoding) requires DepthAI Nodes library. You can install them usingpip
:Command Line
1pip install --pre depthai --force-reinstall
2pip install depthai-nodes
Inference Pipeline
We present here a simple inference pipeline template. It consists of four main sections that we describe in more detail below:- Camera,
- Model and Parser(s);
- Queue(s);
- Results.
Python
1import depthai as dai
2from depthai_nodes.node import ParsingNeuralNetwork
3
4model = "..." # NN Archive or HubAI model identifier
5
6# Create pipeline
7with dai.Pipeline() as pipeline:
8
9 # Camera
10 camera = pipeline.create(dai.node.Camera).build()
11
12 # Model and Parser(s)
13 nn_with_parser = pipeline.create(ParsingNeuralNetwork).build(
14 camera, model
15 )
16
17 # Queue(s)
18 parser_output_queue = nn_with_parser.out.createOutputQueue()
19
20 # Start pipeline
21 pipeline.start()
22
23 while pipeline.isRunning():
24
25 # Results
26 ...
Aside from defining the HubAI model identifier, the template above should work out-of-box.Beware, however, that some OAK devices have internal FPS limitations (e.g. OAK-D Lite).You can set the FPS limit as
pipeline.create(ParsingNeuralNetwork).build(... fps=<limit>)
.Camera
The inference pipeline starts with the Camera node. It is the source of image frames that get inferenced on. The node can be added to a pipeline as follows:Python
1camera_node = pipeline.create(dai.node.Camera).build()
Model and Parser(s)
Inference consists of two steps. First, the model makes predictions on the input data. Second, a postprocessing node, also known as a parser, is utilized to process the model output(s). This step is optional and the raw model output is returned if skipped. You can find more information about the available parsers in the DepthAI Nodes library.A model is set up using the NeuralNetwork node. A model-parser pair can be established:- Automatically, using the
ParsingNeuralNetwork
node wrapper; or - Manually, initializing them as independent nodes and linking them together.
Automatic Setup
TheParsingNeuralNetwork
node can be instantiated directly from:(1) NN Archive;Python
1# Set up NN Archive
2nn_archive = dai.NNArchive(<path/to/NNArchiveName.tar.xz>)
3
4# Set up model (with parser(s)) and link it to camera output
5nn_with_parser = pipeline.create(ParsingNeuralNetwork).build(
6 cameraNode, nn_archive
7)
Python
1# Set up the HubAI model identifier
2model = "..."
3
4# Set up model with parser(s)
5nn_with_parser = pipeline.create(ParsingNeuralNetwork).build(
6 camera_node, model
7)
The automatic setup is available only for native
DepthAI Nodes
parsers. If using a custom parser, please refer to the Manual Setup instructions below.Manual Setup
The model and the parser can also be instantiated as independent nodes.First, import aDepthAI Nodes
parser of interest or implement a parser of your own.Python
1from depthai_nodes.node import <ParserNode>
2# OR:
3class ParserNode(dai.node.ThreadedHostNode):
4 def __init__(self) -> None:
5 super().__init__()
6 self.input = self.createInput()
7 self.out = self.createOutput()
8 def build(self) -> "ParserNode":
9 return self
10 def run(self) -> None:
11 nn_out_raw = self.input.get()
12 nn_out_processed = ... # custom post-processing
13 self.out.send(nn_out_processed)
create()
method on the pipeline:Python
1model = pipeline.create(dai.node.NeuralNetwork)
2parser = pipeline.create(<ParserNode>)
- at initialization, configuration can be set by passing the parameter values as arguments to the
create()
method:parser = pipeline.create(<ParserNode>, <ParameterName>=<ParameterValue>, ...)
If configuring multiple parameters, you can arrange them into adict
and pass it as an argument to the parserbuild()
method:parser = pipeline.create(<ParserNode>).build(config_dict)
; - after initialization, one can change configuration by using the setter methods:
parser.<SetterMethodName>(<ParameterValue>)
. You can find all the setter methods available for a specific parser on the DepthAI Nodes API Reference page.
.blob
for RVC2, or the .dlc
file for RVC4):Python
1model.setModelPath(<path/to/model_executable>)
Python
1width, height = ... # model input size
2camera_stream = camera.requestOutput(size=(width, height))
3camera_stream.link(model.input)
4model.out.link(parser.input)
If interested in building more advanced parsers—similar to our native ones that automatically process NN Archives for setup—check out the parsers section of the DepthAI Nodes library. There, you can explore how we've implemented them in practice.
Queue(s)
Queues are used to obtain data from specific nodes of the pipeline. To obtain the image frame that gets input to the model, you can use the passthrough queue:Python
1frame_queue = nn_with_parser.passthrough.createOutputQueue()
Single-Headed
Python
1parser_output_queue = nn_with_parser.out.createOutputQueue()
Multi-Headed
Python
1head0_parser_output_queue = nn_with_parser.getOutput(0).createOutputQueue()
2head1_parser_output_queue = nn_with_parser.getOutput(1).createOutputQueue()
3...
Results
After the pipeline is started withpipeline.start()
, outputs can be obtained from the defined queue(s). You can obtain the input frame and parsed model outputs as:Python
1while pipeline.isRunning():
2
3 # Get Camera Output
4 frame_queue_output = frame_queue.get()
5 frame = frame_queue_output.getCvFrame()
6 ...
7
8 # Get Parsed Output(s)
9 parser_output = parser_output_queue.get()
10 ...
- generic DepthAI messages, or
- custom-written DepthAI Nodes messages
Examples
Please consult the OAK Examples page.Troubleshooting
Below are some common issues and their solutions.Limit the Number of Model Shaves
Some models are compiled for more shaves than the device supports. This often occurs on older devices like the OAK-D Lite. If the number of shaves in the compiled model exceeds what the device supports, the pipeline will fail with aRuntimeError
similar to:Command Line
1NeuralNetwork: Blob compiled for ... shaves, but only ... are available in current configuration
- Re-export the model with a matching number of shaves when using .blob format.
- Limit the shaves in code to match the device’s available shaves when using the .superblob format with NNArchive.
Python
1nn_archive = dai.NNArchive(...)
2nn_with_parser = pipeline.create(ParsingNeuralNetwork).build(
3 ..., nn_archive
4)
5# Limit the number of shaves to match the device
6nn_with_parser.setNNArchive(
7 nn_archive, numShaves=<Number>
8)
Changing Parser Parameters
To modify parser parameters, you first need to access the parser object.- Pipelines with a separate parser node: Simply access the parser node directly.
- Pipelines with a
ParsingNeuralNetwork
node: In this case, the parser is integrated with the AI model. Retrieve it by calling the.getParser()
method on theParsingNeuralNetwork
node.
Python
1parser.setConfThreshold(0.5)
Different Visualization and Model Input Sizes
You can use separate image sizes for model input and for visualization. Use theImageManip
node to resize the image before sending it to the model, while keeping the original resolution for display.Example:Python
1cam = pipeline.create(dai.node.Camera).build()
2
3# Request specific image size for capture
4cam_out = cam.requestOutput(size=(<width1>, <height1>))
5
6# Create and configure resize node for model input
7resize_node = pipeline.create(dai.node.ImageManip)
8resize_node.initialConfig.setOutputSize(<width2>, <height2>)
9cam_out.link(resize_node.inputImage)
10
11# Define model with resized input
12nn_with_parser: ParsingNeuralNetwork = pipeline.create(ParsingNeuralNetwork).build(
13 resize_node.out, ...
14)
15
16# Visualize Using Original Resolution
17video_queue = cam_out.out.createOutputQueue() # high-res stream
18detection_queue = nn_with_parser.out.createOutputQueue() # detections on the low-res stream
19...
Model Location When Downloaded from HubAI
When you download a model from HubAI, it’s stored in the.depthai_cached_models
folder at the project root. This cache contains all models from previous runs. If a model is already cached, it’s loaded locally instead of re-downloading. To force a fresh download, can use the useCached=False
parameter when downloading the model. Example:Python
1nn_archive = dai.NNArchive(dai.getModelFromZoo(model_description, useCached=False))
2nn_with_parser = pipeline.create(ParsingNeuralNetwork).build(
3 ..., nn_archive
4)
.depthai_cached_models
folder and re-run the pipeline.Further Reading
The following section provide additional information and insights into the inference process on RVC4 platform and how NN models are executed on Qualcomm's Hexagon Tensor Processor (HTP).Concurrent Model Execution on RVC4
When running multiple models concurrently on the Hexagon Tensor Processor (HTP) of a Qualcomm SoC, there are some important considerations to keep in mind regarding resource allocation and scheduling. The HTP shares compute and on-chip memory resources dynamically. Based on Qualcomm, there is no direct way to prioritise one model over another on the HTP and this is not user-tunable. You should treat the HTP as a black-box scheduler and pick the threading strategy that empirically works best.Resource allocation on HTP
- HTP compute cores and V-TCM memory are shared elastically across all concurrent SNPE sessions.
- The internal scheduler uses a round-robin strategy; the split of the resources is not fixed between multiple models and can vary frame-to-frame.
Control knobs
- Resource steering is not supported. There is no per-model priority, core-affinity or quota API. The only global lever is the
--perf_profile
flag which affects power/perf trade-offs at the SoC level.
Practical guidance
- Assume latency and throughput will fluctuate when new/concurrent sessions start or stop.
- Use SNPE timing logs such as layer-wise profiling, to measure end-to-end latency instead of guessing resource shares.