# Inference

## Overview

Models converted for RVC Platforms can be deployed on OAK devices to perform inference. The following section guides you through
setting up a simple inference pipeline for a desired AI model. We utilize DepthAI to build the inference pipeline as a sequence
of:

 * [Built-in nodes](https://docs.luxonis.com/software-v3/depthai/depthai-components/nodes.md) (run directly on Luxonis devices),
   and
 * [Host nodes](https://docs.luxonis.com/software-v3/depthai/depthai-components/host_nodes.md) (run on the host).

The nodes of both kinds can be connected interchangably. The Built-in nodes are stable, optimized, and ensure efficient
performance on Luxonis devices, while the Host nodes offer a greater flexibility and can be customized to meet specific use case.
Please check out the [DepthAI Nodes](https://docs.luxonis.com/software-v3/ai-inference/inference/depthai-nodes.md) library for our
in-house collection of Python host nodes.

The inference pipeline can be defined manually, node-by-node. However, we also offer a certain degree of automation of pipeline
creation based on the relevant NN Archive (for example, automatically connecting the neural network with a specific host node
responsible for decoding its outputs). Please see below for more information.

> If the model of choice is not converted for a desired RVC platform, please refer to the
> [Conversion](https://docs.luxonis.com/software-v3/ai-inference/conversion.md)
> section.

## Installation

Creation of an inference pipeline requires DepthAI (v3) library. Usage of our custom host nodes (e.g. for model output decoding)
requires DepthAI Nodes library. You can install them using pip:

```bash
pip install depthai --force-reinstall
pip install depthai-nodes
```

## Inference Pipeline

We present here a simple inference pipeline template. It consists of four main sections that we describe in more detail below:

 * Camera,
 * Model and Parser(s);
 * Queue(s);
 * Results.

```python
import depthai as dai
from depthai_nodes.node import ParsingNeuralNetwork

model = "..." # NN Archive or HubAI model identifier

# Create pipeline
with dai.Pipeline() as pipeline:

    # Camera
    camera = pipeline.create(dai.node.Camera).build()

    # Model and Parser(s)
    nn_with_parser = pipeline.create(ParsingNeuralNetwork).build(
        camera, model
    )

    # Queue(s)
    parser_output_queue = nn_with_parser.out.createOutputQueue()

    # Start pipeline
    pipeline.start()

    while pipeline.isRunning():

        # Results
        ...
```

> Aside from defining the
> **HubAI model identifier**
> , the template above should work out-of-box. Beware, however, that some OAK devices have internal FPS limitations (e.g. OAK-D
Lite). You can set the FPS limit as
> `pipeline.create(ParsingNeuralNetwork).build(... fps=<limit>)`
> .

### Camera

The inference pipeline starts with the [Camera](https://docs.luxonis.com/software-v3/depthai/depthai-components/nodes/camera.md)
node. It is the source of image frames that get inferenced on. The node can be added to a pipeline as follows:

```python
camera_node = pipeline.create(dai.node.Camera).build()
```

### Model and Parser(s)

Inference consists of two steps. First, the model makes predictions on the input data. Second, a postprocessing node, also known
as a parser, is utilized to process the model output(s). This step is optional and the raw model output is returned if skipped.
You can find more information about the available parsers in the [DepthAI Nodes](https://github.com/luxonis/depthai-nodes)
library.

A model is set up using the
[NeuralNetwork](https://docs.luxonis.com/software-v3/depthai/depthai-components/nodes/neural_network.md) node. A model-parser pair
can be established:

 * Automatically, using the ParsingNeuralNetwork node, or
 * Manually, initializing them as independent nodes and linking them together.

The former automatically links the model outputs with the appropriate parsers as defined in the relevant [NN
Archive](https://docs.luxonis.com/software-v3/ai-inference/nn-archive.md). This abstracts away all the configuration details and
is thus the preffered way of interraction with parsers. The created nodes (independent or not) can be used the same way as
standard DepthAI nodes by linking them to other nodes or define them as pipeline queues.

#### Automatic Setup

The ParsingNeuralNetwork node extends the standard NeuralNetwork node by adding automatic parsing capabilities for model outputs.
It can be imported from the depthai_nodes package as:

```python
from depthai_nodes.node import ParsingNeuralNetwork
```

and instantiated directly from either:

(1) NN Archive object, or

```python
# Set up NN Archive
nn_archive = dai.NNArchive(<path/to/NNArchiveName.tar.xz>)

# Set up model (with parser(s)) and link it to camera output
nn_with_parser = pipeline.create(ParsingNeuralNetwork).build(
    cameraNode, nn_archive
)
```

(2) HubAI, by specifying the model identifier (a unique identifier of a model on the HubAI platform. Find more information at the
[Model Upload/Download](https://docs.luxonis.com/cloud/hubai/model-registry/upload-download.md) section).

```python
# Set up the HubAI model identifier
model = "..."

# Set up model with parser(s)
nn_with_parser = pipeline.create(ParsingNeuralNetwork).build(
    camera_node, model
)
```

When initialized, the pipeline automatically detects the platform of the connected device and sets up both the model and the
relevant parser(s). Moreover, it sets up the camera node and links it to the model input.

> If you plan to use a private HubAI model, make sure to configure your
> **Luxonis Hub API Key**
> . Instructions are available on the
> [API Key Good Practices](https://docs.luxonis.com/software-v3/oak-apps/apikey-good-practices.md)
> page. Once set up correctly, the API key will be applied automatically to authenticate your requests with the HubAI platform.

#### Manual Setup

The model and the parser can also be instantiated as independent nodes.

First, import a DepthAI Nodes parser of interest or implement a parser of your own.

```python
from depthai_nodes.node import <ParserNode>
# OR:
class ParserNode(dai.node.ThreadedHostNode):
    def __init__(self) -> None:
        super().__init__()
        self.input = self.createInput()
        self.out = self.createOutput()
    def build(self) -> "ParserNode":
        return self
    def run(self) -> None:
        nn_out_raw = self.input.get()
        nn_out_processed = ... # custom post-processing
        self.out.send(nn_out_processed)
```

Second, initialize the model and the parser as individual nodes by calling the create() method on the pipeline:

```python
model = pipeline.create(dai.node.NeuralNetwork)
parser = pipeline.create(<ParserNode>)
```

The nodes are initialized using the default parameters and can be further configured according to your needs either:

 * at initialization, configuration can be set by passing the parameter values as arguments to the create() method: parser =
   pipeline.create(<ParserNode>, <ParameterName>=<ParameterValue>, ...) If configuring multiple parameters, you can arrange them
   into a dict and pass it as an argument to the parser build() method: parser = pipeline.create(<ParserNode>).build(config_dict);
 * after initialization, one can change configuration by using the setter methods: parser.<SetterMethodName>(<ParameterValue>).
   You can find all the setter methods available for a specific parser on the [DepthAI Nodes API
   Reference](https://docs.luxonis.com/software-v3/ai-inference/inference/depthai-nodes.md) page.

Third, set the model executable (i.e. the .blob for RVC2, or the .dlc file for RVC4):

```python
model.setModelPath(<path/to/model_executable>)
```

Last, prepare the camera stream and link the idependent nodes to constitute a pipeline:

```python
width, height = ... # model input size
camera_stream = camera.requestOutput(size=(width, height))
camera_stream.link(model.input)
model.out.link(parser.input)
```

> If interested in building more advanced parsers—similar to our native ones that automatically process
> **NN Archives**
> for setup—check out the
> [parsers](https://github.com/luxonis/depthai-nodes/tree/main/depthai_nodes/node/parsers)
> section of the
> [DepthAI Nodes](https://github.com/luxonis/depthai-nodes)
> library. There, you can explore how we've implemented them in practice.

### Queue(s)

Queues are used to obtain data from specific nodes of the pipeline. To obtain the image frame that gets input to the model, you
can use the passthrough queue:

```python
frame_queue = nn_with_parser.passthrough.createOutputQueue()
```

To obtain the (parsed) model output, you can use the output queue(s). The definition depends on the number of model heads:

#### Single-Headed

```python
parser_output_queue = nn_with_parser.out.createOutputQueue()
```

#### Multi-Headed

```python
head0_parser_output_queue = nn_with_parser.getOutput(0).createOutputQueue()
head1_parser_output_queue = nn_with_parser.getOutput(1).createOutputQueue()
...
```

### Results

After the pipeline is started with pipeline.start(), outputs can be obtained from the defined queue(s). You can obtain the input
frame and parsed model outputs as:

```python
while pipeline.isRunning():

    # Get Camera Output
    frame_queue_output = frame_queue.get()
    frame = frame_queue_output.getCvFrame()
    ...

    # Get Parsed Output(s)
    parser_output = parser_output_queue.get()
    ...
```

The parsed model outputs are returned as:

 * generic [DepthAI messages](https://docs.luxonis.com/software-v3/depthai/depthai-components/messages.md), or
 * custom-written [DepthAI Nodes messages](https://github.com/luxonis/depthai-nodes/blob/main/depthai_nodes/message/README.md)

Please read the [DepthAI Nodes API reference](https://docs.luxonis.com/software-v3/ai-inference/inference/depthai-nodes.md) to
learn more about the relevant formats and how to utilize them for your use case.

## Examples

Please consult the [OAK Examples](https://docs.luxonis.com/software-v3/ai-inference/inference/oak-examples.md) page.

## Troubleshooting

Below are some common issues and their solutions.

### Setting Model SHAVEs

SHAVEs are the compute cores in the RVC2 VPU that run neural networks. Sometimes, a model is built to use a different number of
SHAVEs than what the target device supports.

If the model is compiled to use more SHAVEs than the device actually has, the pipeline will fail with a RuntimeError similar to:

```bash
NeuralNetwork: Blob compiled for ... shaves, but only ... are available in current configuration
```

This often occurs on older devices like the OAK-D Lite.

Conversely, if the model is compiled to use fewer SHAVEs than are available on the device, it will still run, but you may see a
warning like:

```bash
[14442C103180EECF00] [2.1] [4.736] [NeuralNetwork(2)] [warning] Network compiled for 8 shaves, maximum available 13, compiling for 6 shaves likely will yield in better performance
```

In this case, the model is underutilizing the available compute resources, and recompiling it to better match the device's SHAVE
count can improve performance.

To fix the number of utilized SHAVES, you can either:

 * if model was exported with legacy
   [Blobconverter](https://docs.luxonis.com/software-v3/ai-inference/conversion/rvc-conversion/online/blobconverter.md), you can
   recompile the model with a matching number of SHAVEs, or
 * if the model was exported within [HubAI](https://docs.luxonis.com/cloud/hubai/model-registry/detailed-conversion.md)
   recompilation is not needed - you can set the number of SHAVEs at pipeline initialization to match the device requirements:

```python
nn_archive = dai.NNArchive(...)
nn_with_parser = pipeline.create(ParsingNeuralNetwork).build(
    ..., nn_archive
)
# Set the number of SHAVEs
nn_with_parser.setNNArchive(
    nn_archive, numShaves=<Number>
)
```

> The old
> `SHAVE`
> configuration method is no longer supported in
> **DepthAI v3**
> .
> Avoid using:
> ```python
> nn = pipeline.create(dai.node.NeuralNetwork)
> nn.setNumShaves(6)
> ```

### Changing Parser Parameters

To modify parser parameters, you first need to access the parser object.

 * Pipelines with a separate parser node: Simply access the parser node directly.
 * Pipelines with a ParsingNeuralNetwork node: In this case, the parser is integrated with the AI model. Retrieve it by calling
   the .getParser() method on the ParsingNeuralNetwork node.

Once you have the parser, update its parameters using the relevant set methods. Example:

```python
parser.setConfThreshold(0.5)
```

### Different Visualization and Model Input Sizes

You can use separate image sizes for model input and for visualization. Use the ImageManip node to resize the image before sending
it to the model, while keeping the original resolution for display.

Example:

```python
cam = pipeline.create(dai.node.Camera).build()

# Request specific image size for capture
cam_out = cam.requestOutput(size=(<width1>, <height1>))

# Create and configure resize node for model input
resize_node = pipeline.create(dai.node.ImageManip)
resize_node.initialConfig.setOutputSize(<width2>, <height2>)
cam_out.link(resize_node.inputImage)

# Define model with resized input
nn_with_parser: ParsingNeuralNetwork = pipeline.create(ParsingNeuralNetwork).build(
    resize_node.out, ...
)

# Visualize Using Original Resolution
video_queue = cam_out.out.createOutputQueue() # high-res stream
detection_queue = nn_with_parser.out.createOutputQueue() # detections on the low-res stream
...
```

### Model Location When Downloaded from HubAI

When you download a model from HubAI, it’s stored in the .depthai_cached_models folder at the project root. This cache contains
all models from previous runs. If a model is already cached, it’s loaded locally instead of re-downloading. To force a fresh
download, can use the useCached=False parameter when downloading the model. Example:

```python
nn_archive = dai.NNArchive(dai.getModelFromZoo(model_description, useCached=False))
nn_with_parser = pipeline.create(ParsingNeuralNetwork).build(
    ..., nn_archive
)
```

Alternatively, you can delete the .depthai_cached_models folder and re-run the pipeline.

## Further Reading

The following section provide additional information and insights into the inference process on RVC4 platform and how NN models
are executed on Qualcomm's Hexagon Tensor Processor (HTP).

### Concurrent Model Execution on RVC4

When running multiple models concurrently on the Hexagon Tensor Processor (HTP) of a Qualcomm SoC, there are some important
considerations to keep in mind regarding resource allocation and scheduling. The HTP shares compute and on-chip memory resources
dynamically. Based on Qualcomm, there is no direct way to prioritise one model over another on the HTP and this is not
user-tunable. You should treat the HTP as a black-box scheduler and pick the threading strategy that empirically works best.

#### Resource allocation on HTP

 * HTP compute cores and V-TCM memory are shared elastically across all concurrent SNPE sessions.
 * The internal scheduler uses a round-robin strategy; the split of the resources is not fixed between multiple models and can
   vary frame-to-frame.

#### Control knobs

 * Resource steering is not supported. There is no per-model priority, core-affinity or quota API. The only global lever is the
   --perf_profile flag which affects power/perf trade-offs at the SoC level.

#### Practical guidance

 1. Assume latency and throughput will fluctuate when new/concurrent sessions start or stop.
 2. Use SNPE timing logs such as layer-wise profiling, to measure end-to-end latency instead of guessing resource shares.