NNComponent

NNComponent abstracts sourcing & decoding AI models, creating a DepthAI API node for neural inferencing, object tracking, and MultiStage pipelines setup. It also supports Roboflow integration.

DepthAI API nodes

For neural inference, NNComponent will use DepthAI API node:

If tracker argument is set and we have YOLO/MobileNet-SSD based model, this component will also create ObjectTracker node, and connect the two nodes togeter.

Usage

from depthai_sdk import OakCamera, ResizeMode

with OakCamera(recording='cars-tracking-above-01') as oak:
    color = oak.create_camera('color')
    nn = oak.create_nn('vehicle-detection-0202', color, tracker=True)
    nn.config_nn(resize_mode='stretch')

    oak.visualize([nn.out.tracker, nn.out.passthrough], fps=True)
    oak.start(blocking=True)

Component outputs

  • main - Default output. Streams NN results and high-res frames that were downscaled and used for inferencing. Produces DetectionPacket or TwoStagePacket (if it’s 2. stage NNComponent).

  • passthrough - Default output. Streams NN results and passthrough frames (frames used for inferencing). Produces DetectionPacket or TwoStagePacket (if it’s 2. stage NNComponent).

  • spatials - Streams depth and bounding box mappings (SpatialDetectionNework.boundingBoxMapping). Produces SpatialBbMappingPacket.

  • twostage_crops - Streams 2. stage cropped frames to the host. Produces FramePacket.

  • tracker - Streams ObjectTracker’s tracklets and high-res frames that were downscaled and used for inferencing. Produces TrackerPacket.

  • nn_data - Streams NN raw output. Produces NNDataPacket.

Decoding outputs

NNComponent allows user to define their own decoding functions. There is a set of standardized outputs:

Note

This feature is still in development and is not guaranteed to work correctly in all cases.

Example usage:

import numpy as np
from depthai import NNData

from depthai_sdk import OakCamera
from depthai_sdk.classes import Detections

def decode(nn_data: NNData):
    layer = nn_data.getFirstLayerFp16()
    results = np.array(layer).reshape((1, 1, -1, 7))
    dets = Detections(nn_data)

    for result in results[0][0]:
        if result[2] > 0.5:
            dets.add(result[1], result[2], result[3:])

    return dets


def callback(packet: DetectionPacket, visualizer: Visualizer):
    detections: Detections = packet.img_detections
    ...


with OakCamera() as oak:
    color = oak.create_camera('color')

    nn = oak.create_nn(..., color, decode_fn=decode)

    oak.visualize(nn, callback=callback)
    oak.start(blocking=True)

Reference

class depthai_sdk.components.NNComponent(device, pipeline, model, input, nn_type=None, decode_fn=None, tracker=False, spatial=None, replay=None, args=None)
get_name()
get_labels()
config_multistage_nn(debug=False, labels=None, scale_bb=None, num_frame_pool=None)

Configures the MultiStage NN pipeline. Available if the input to this NNComponent is Detection NNComponent.

Parameters
  • debug (bool, default False) – Debug script node

  • labels (List[int], optional) – Crop & run inference only on objects with these labels

  • scale_bb (Tuple[int, int], optional) – Scale detection bounding boxes (x, y) before cropping the frame. In %.

  • num_frame_pool (int, optional) – Number of frames to pool for inference. If None, will use the default value.

Return type

None

config_tracker(tracker_type=None, track_labels=None, assignment_policy=None, max_obj=None, threshold=None, apply_tracking_filter=None, forget_after_n_frames=None, calculate_speed=None)

Configure Object Tracker node (if it’s enabled).

Parameters
  • tracker_type (dai.TrackerType, optional) – Set object tracker type

  • track_labels (List[int], optional) – Set detection labels to track

  • assignment_policy (dai.TrackerType, optional) – Set object tracker ID assignment policy

  • max_obj (int, optional) – Set max objects to track. Max 60.

  • threshold (float, optional) – Specify tracker threshold. Default: 0.0

  • apply_tracking_filter (bool, optional) – Set whether to apply Kalman filter to the tracked objects. Done on the host.

  • forget_after_n_frames (int, optional) – Set how many frames to track an object before forgetting it.

  • calculate_speed (bool, optional) – Set whether to calculate object speed. Done on the host.

Return type

None

config_yolo_from_metadata(metadata)

Configures (Spatial) Yolo Detection Network node with a dictionary. Calls config_yolo().

Parameters

metadata (Dict) –

Return type

None

config_yolo(num_classes, coordinate_size, anchors, masks, iou_threshold, conf_threshold=None)

Configures (Spatial) Yolo Detection Network node.

Parameters
  • num_classes (int) –

  • coordinate_size (int) –

  • anchors (List[float]) –

  • masks (Dict[str, List[int]]) –

  • iou_threshold (float) –

  • conf_threshold (Optional[float]) –

Return type

None

config_nn(conf_threshold=None, resize_mode=None)

Configures the Detection Network node.

Parameters
  • conf_threshold (Optional[float]) – (float, optional): Confidence threshold for the detections (0..1]

  • resize_mode (Optional[Union[depthai_sdk.classes.enum.ResizeMode, str]]) – (ResizeMode, optional): Change aspect ratio resizing mode - to either STRETCH, CROP, or LETTERBOX.

Return type

None

config_spatial(bb_scale_factor=None, lower_threshold=None, upper_threshold=None, calc_algo=None)

Configures the Spatial Detection Network node.

Parameters
  • bb_scale_factor (float, optional) – Specifies scale factor for detected bounding boxes (0..1]

  • lower_threshold (int, optional) – Specifies lower threshold in depth units (millimeter by default) for depth values which will used to calculate spatial data

  • upper_threshold (int, optional) – Specifies upper threshold in depth units (millimeter by default) for depth values which will used to calculate spatial data

  • calc_algo (dai.SpatialLocationCalculatorAlgorithm, optional) – Specifies spatial location calculator algorithm: Average/Min/Max

Return type

None

get_bbox()
Return type

depthai_sdk.visualize.bbox.BoundingBox

class Out(nn_component)
class MainOut(component)

Default output. Streams NN results and high-res frames that were downscaled and used for inferencing. Produces DetectionPacket or TwoStagePacket (if it’s 2. stage NNComponent).

class PassThroughOut(component)
class ImgManipOut(component)
class InputOut(component)
class SpatialOut(component)
class TwoStageOut(component)
class TrackerOut(component)
class EncodedOut(component)
class NnDataOut(component)
is_spatial()
Return type

bool

is_tracker()
Return type

bool

is_yolo()
Return type

bool

is_mobile_net()
Return type

bool

is_detector()

Currently these 2 object detectors are supported

Return type

bool

is_multi_stage()

General (standarized) NN outputs, to be used for higher-level abstractions (eg. automatic visualization of results). “SDK supported NN models” will have to have standard NN output, so either dai.ImgDetections, or one of the outputs below. If the latter, model json config will incldue handler.py logic for decoding to the standard NN output. These will be integrated into depthai-core, bonus points for on-device decoding of some popular models.

class depthai_sdk.classes.nn_results.Detection(img_detection: Union[NoneType, depthai.ImgDetection, depthai.SpatialImgDetection], label_str: str, confidence: float, color: Tuple[int, int, int], bbox: depthai_sdk.visualize.bbox.BoundingBox, angle: Union[int, NoneType], ts: Union[datetime.timedelta, NoneType])
img_detection: Union[None, depthai.ImgDetection, depthai.SpatialImgDetection]
label_str: str
confidence: float
color: Tuple[int, int, int]
bbox: depthai_sdk.visualize.bbox.BoundingBox
angle: Optional[int]
ts: Optional[datetime.timedelta]
property top_left
property bottom_right
class depthai_sdk.classes.nn_results.TrackingDetection(img_detection: Union[NoneType, depthai.ImgDetection, depthai.SpatialImgDetection], label_str: str, confidence: float, color: Tuple[int, int, int], bbox: depthai_sdk.visualize.bbox.BoundingBox, angle: Union[int, NoneType], ts: Union[datetime.timedelta, NoneType], tracklet: depthai.Tracklet, filtered_2d: depthai_sdk.visualize.bbox.BoundingBox, filtered_3d: depthai.Point3f, speed: Union[float, NoneType])
tracklet: depthai.Tracklet
filtered_2d: depthai_sdk.visualize.bbox.BoundingBox
filtered_3d: depthai.Point3f
speed: Optional[float]
property speed_kmph
property speed_mph
class depthai_sdk.classes.nn_results.TwoStageDetection(img_detection: Union[NoneType, depthai.ImgDetection, depthai.SpatialImgDetection], label_str: str, confidence: float, color: Tuple[int, int, int], bbox: depthai_sdk.visualize.bbox.BoundingBox, angle: Union[int, NoneType], ts: Union[datetime.timedelta, NoneType], nn_data: depthai.NNData)
nn_data: depthai.NNData
class depthai_sdk.classes.nn_results.GenericNNOutput(nn_data)

Generic NN output, to be used for higher-level abstractions (eg. automatic visualization of results).

getTimestamp()
Return type

datetime.timedelta

getSequenceNum()
Return type

int

class depthai_sdk.classes.nn_results.ExtendedImgDetection(angle: int)
class depthai_sdk.classes.nn_results.Detections(nn_data, is_rotated=False)

Detection results containing bounding boxes, labels and confidences. Optionally can contain rotation angles.

class depthai_sdk.classes.nn_results.SemanticSegmentation(nn_data, mask)

Semantic segmentation results, with a mask for each class.

Examples: DeeplabV3, Lanenet, road-segmentation-adas-0001.

mask: List[numpy.ndarray]
class depthai_sdk.classes.nn_results.ImgLandmarks(nn_data, landmarks=None, landmarks_indices=None, pairs=None, colors=None)

Landmarks results, with a list of landmarks and pairs of landmarks to draw lines between.

Examples: human-pose-estimation-0001, openpose2, facial-landmarks-68, landmarks-regression-retail-0009.

class depthai_sdk.classes.nn_results.InstanceSegmentation(nn_data, masks, labels)

Instance segmentation results, with a mask for each instance.

masks: List[numpy.ndarray]
labels: List[int]