Python API

Instructions for installing, upgrading, and using the DepthAI Python API.

Supported Platforms

The DepthAI API python module is prebuilt for Ubuntu, MaxOS and Windows. For other operating systems and/or Python versions, DepthAI can be built from source.

Installing system dependencies

A couple of basic system dependencies are required to run the DepthAI library. Most of them should be already installed in most of the systems, but in case they are not, we prepared an install script that will make sure all dependencies are installed:

curl -fL http://docs.luxonis.com/_static/install_dependencies.sh | bash

If using Windows, please use this batch script for dependencies installation

Enabling the USB device (only on Linux)

Since the DepthAI is a USB device, in order to communicate with it on the systems that use udev tool, you need to add the udev rules in order to make the device accessible.

The following command will add a new udev rule to your system

echo 'SUBSYSTEM=="usb", ATTRS{idVendor}=="03e7", MODE="0666"' | sudo tee /etc/udev/rules.d/80-movidius.rules
sudo udevadm control --reload-rules && sudo udevadm trigger

Install from PyPi

Our packages are distributed via PyPi, to install it in your environment use

python3 -m pip install depthai

For other installation options, see other installation options.

Test installation

We have depthai repository on our GitHub that contains many helpful examples and prepared neural networks you can use to make your prototyping faster. It also includes the test script, maintained by our contributors, that should help you verify if your setup was correct.

First, clone the depthai repository and install its dependencies

git clone https://github.com/luxonis/depthai.git
cd depthai
python3 -m pip install -r requirements.txt

Now, run the demo script from within depthai to make sure everything is working:

python3 depthai_demo.py

If all goes well a small window video display with overlays for any items for which the class exists in the example 20-class object detector (class list here).

Preparing MyriadX blob file and it’s config

As you can see in example, basic usage of Device.create_pipeline() method consists of specifying desired output streams and AI section, where you specify MyriadX blob and it’s config.

In this section, we’ll describe how to obtain both blob_file and blob_file_config.

Obtaining MyriadX blob

Since we’re utilizing MyriadX VPU, your model needs to be compiled (or accurately - optimized and converted) into the MyriadX blob file, which will be sent to the device and executed.

Easiest way to obtain this blob is to use our online BlobConverter app. It has all tools needed for compilation so you don’t need to setup anything - and you can even download a blob for the model from OpenVINO model zoo.

If you’d like, you can also compile the blob yourself. You’ll need to install OpenVINO toolkit, then use Model Optimizer and Myriad Compiler in order to obtain MyriadX blob. We’ve documented example usage of these compilers here

Creating Blob configuration file

If config file is not provided or output_format is set to raw, no decoding is done on device and user must do it manually on host side.

Currently there is support to decode Mobilenet-SSD and (tiny-)YOLO-v3 based networks on the device. For that config file is required with network specific parameters.

Example for tiny-yolo-v3 network:

{
    "NN_config":
    {
        "output_format" : "detection",
        "NN_family" : "YOLO",
        "NN_specific_metadata" :
        {
            "classes" : 80,
            "coordinates" : 4,
            "anchors" : [10,14, 23,27, 37,58, 81,82, 135,169, 344,319],
            "anchor_masks" :
            {
                "side26" : [1,2,3],
                "side13" : [3,4,5]
            },
            "iou_threshold" : 0.5,
            "confidence_threshold" : 0.5
        }
    },
    "mappings":
    {
        "labels":
        [
            "person",
            "bicycle",
            "car",
            "..."
        ]
    }
}
  • NN_config - configuration for the network
    • output_format
      • "detection" - decoding done on device, the received packet is in Detections format

      • "raw" - decoding done on host

    • NN_family - “YOLO” or “mobilenet”

    • NN_specific_metadata - only for “YOLO”
      • classes - number of classes

      • coordinates - number of coordinates

      • anchors - anchors for YOLO network

      • anchor_masks - anchor mask for each output layer : 26x26, :code`13x13` (+ 52x52 for full YOLO-v3)

      • iou_threshold - intersection over union threshold for detected object

      • confidence_threshold - score confidence threshold for detected object

  • mappings.labels - used by depthai_demo.py script to decode labels from id’s

Example decoding when output_format is set to detection:

nnet_packets, data_packets = p.get_available_nnet_and_data_packets()

for nnet_packet in nnet_packets:
  in_layers = nnet_packet.getInputLayersInfo()

  input_width  = in_layers[0].get_dimension(depthai.TensorInfo.Dimension.W)
  input_height = in_layers[0].get_dimension(depthai.TensorInfo.Dimension.H)

  detections = nnet_packet.getDetectedObjects()
  objects = list()

  for detection in detections:
      detection_dict = detection.get_dict()
      # scale normalized coordinates to image coordinates
      detection_dict["x_min"] = int(detection_dict["x_min"] * input_width)
      detection_dict["y_min"] = int(detection_dict["y_min"] * input_height)
      detection_dict["x_max"] = int(detection_dict["x_max"] * input_width)
      detection_dict["y_max"] = int(detection_dict["y_max"] * input_height)
      objects.append(detection_dict)

print(objects)

Example of decoding for full yolo-v3 and tiny-yolo-v3 on host and device is here

Example of decoding for mobilenet based networks on host and device is here

Other installation methods

To get the latest and yet unreleased features from our source code, you can go ahead and compile depthai package manually.

Dependencies to build from source

  • CMake > 3.2.0

  • Generation tool (Ninja, make, …)

  • C/C++ compiler

  • libusb1 development package

Ubuntu, Raspberry Pi OS, … (Debian based systems)

On Debian based systems (Raspberyy Pi OS, Ubuntu, …) these can be acquired by running:

sudo apt-get -y install cmake libusb-1.0-0-dev build-essential

macOS (Mac OS X)

Assuming a stock Mac OS X install, depthai-python library needs following dependencies

  • HomeBrew (If it’s not installed already)

    /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install.sh)"
    
  • Python, libusb, CMake, wget

    brew install coreutils python3 cmake libusb wget
    

And now you’re ready to clone the depthai-python from Github and build it for Mac OS X.

Install using GitHub commit

Pip allows users to install the packages from specific commits, even if they are not yet released on PyPi.

To do so, use the command below - and be sure to replace the <commit_sha> with the correct commit hash from here

python3 -m pip install git+https://github.com/luxonis/depthai-python.git@<commit_sha>

Using/Testing a Specific Branch/PR

From time to time, it may be of interest to use a specific branch. This may occur, for example, because we have listened to your feature request and implemented a quick implementation in a branch. Or it could be to get early access to a feature that is soaking in our develop for stability purposes before being merged into main.

So when working in the depthai repository, using a branch can be accomplished with the following commands. For this example, the branch that we will try out is develop (which is the branch we use to soak new features before merging them into main):

Prior to running the following, you can either clone the respository independently (for not over-writing any of your local changes) or simply do a git pull first.

git checkout develop
python3 -m pip install -U pip
python3 -m pip install -r requirements.txt

Install from source

If desired, you can also install the package from the source code itself - it will allow you to make the changes to the API and see them live in action.

To do so, first download the repository and then add the package to your python interpreter in development mode

git clone https://github.com/luxonis/depthai-python.git
cd depthai-python
git submodule update --init --recursive
python3 setup.py develop  # you may need to add sudo if using system interpreter instead of virtual environment

If you want to use other branch (e.g. develop) than default (main), you can do so by typing

git checkout develop  # replace the "develop" with a desired branch name
git submodule update --recursive
python3 setup.py develop

Or, if you want to checkout a specific commit, type

git checkout <commit_sha>
git submodule update --recursive
python3 setup.py develop

API Reference

class Device

Represents the DepthAI device with the methods to interact with it.

Warning

Please be aware that all methods except get_available_streams() require create_pipeline() to be run first,

Example

import depthai
device = depthai.Device('', False)
pipeline = device.create_pipeline(config={
    'streams': ['previewout', 'metaout'],
    'ai': {
        "blob_file": "/path/to/model.blob",
        "blob_file_config": "/path/to/config.json",
    },
})

Methods

__init__(device_id: str, usb2_mode: bool)Device

Standard and recomended way to set up the object.

device_id represents the USB port id that the device is connected to. If set to specific value (e.x. "1") it will look for the device in specific USB port, whereas if left empty - '' - it will look for the device on all ports. It’s useful when we have more than one DepthAI devices connected and want to specify which one to use in the code

usb2_mode, being True/False, allows the DepthAI to communicate using USB2 protocol, not USB3. This lowers the throughput of the pipeline, but allows to use >1m USB cables for connection

__init__(cmd_file: str, device_id: str)Device

Development and debug way to initialize the DepthAI device.

cmd_file is a path to firmware .cmd file that will be loaded onto the device for boot.

device_id represents the USB port id that the device is connected to. If set to specific value (e.x. "1") it will look for the device in specific USB port, whereas if left empty - '' - it will look for the device on all ports. It’s useful when we have more than one DepthAI devices connected and want to specify which one to use in the code

create_pipeline(config: dict)depthai.CNNPipeline

Initializes a DepthAI Pipeline, returning the created CNNPipeline if successful and None otherwise.

config(dict) - A dict of pipeline configuration settings. Example key/values for the config:

{
    # Possible streams:
    #   'color' - 4K color camera preview
    #   'left' - left mono camera preview
    #   'right' - right mono camera preview
    #   'rectified_left' - rectified left camera preview
    #   'rectified_right' - rectified right camera preview
    #   'previewout' - neural network input preview
    #   'metaout' - CNN output tensors
    #   'depth' - the raw depth map, disparity converted to real life distance
    #   'disparity' - disparity map, the diaparity between left and right cameras, in pixels
    #   'disparity_color' - disparity map colorized
    #   'meta_d2h' - device metadata stream
    #   'video' - H.264/H.265 encoded color camera frames
    #   'jpegout' - JPEG encoded color camera frames
    #   'object_tracker' - Object tracker results
    'streams': [
        'left',  # if left is used, it must be in the first position
        'right',
        {'name': 'previewout', 'max_fps': 12.0},  # streams can be specified as objects with additional params
        'metaout',
        # depth-related streams
        {'name': 'depth', 'max_fps': 12.0},
        {'name': 'disparity', 'max_fps': 12.0},
        {'name': 'disparity_color', 'max_fps': 12.0},
    ],
    'depth':
    {
        'calibration_file': consts.resource_paths.calib_fpath,
        'left_mesh_file': consts.resource_paths.left_mesh_fpath,
        'right_mesh_file': consts.resource_paths.right_mesh_fpath,
        'padding_factor': 0.3,
        'depth_limit_m': 10.0, # In meters, for filtering purpose during x,y,z calc
        'median_kernel_size': 7,  # Disparity / depth median filter kernel size (N x N) . 0 = filtering disabled
        'lr_check': True  # Enable stereo 'Left-Right check' feature.
        'warp_rectify':
        {
            'use_mesh' : True, # if False, will use homography
            'mirror_frame': True, # if False, the disparity will be mirrored instead
            'edge_fill_color': 0, # gray 0..255, or -1 to replicate pixel values
        },
    },
    'ai':
    {
        'blob_file': blob_file,
        'blob_file_config': blob_file_config,
        'blob_file2': blob_file2,
        'blob_file_config2': blob_file_config2,
        'calc_dist_to_bb': True, # depth calculation on CNN models with bounding box output
        'keep_aspect_ratio': False, # Keep aspect ratio, don't use full RGB FOV for NN
        'camera_input': "left", # 'rgb', 'left', 'right', 'left_right', 'rectified_left', 'rectified_right', 'rectified_left_right'
        'shaves' : 7,  # 1 - 14 Number of shaves used by NN.
        'cmx_slices' : 7,  # 1 - 14 Number of cmx slices used by NN.
        'NN_engines' : 2,  # 1 - 2 Number of NN_engines used by NN.
    },
    # object tracker
    'ot':
    {
        'max_tracklets'        : 20, #maximum 20 is supported
        'confidence_threshold' : 0.5, #object is tracked only for detections over this threshold
    },
    'board_config':
    {
        'swap_left_and_right_cameras': True, # Swap the Left and Right cameras.
        'left_fov_deg': 71.86, # Horizontal field of view (HFOV) for the stereo cameras in [deg].
        'rgb_fov_deg': 68.7938, # Horizontal field of view (HFOV) for the RGB camera in [deg]
        'left_to_right_distance_cm': 9.0, # Left/Right camera baseline in [cm]
        'left_to_rgb_distance_cm': 2.0, # Distance the RGB camera is from the Left camera.
        'store_to_eeprom': False, # Store the calibration and board_config (fov, baselines, swap-lr) in the EEPROM onboard
        'clear_eeprom': False, # Invalidate the calib and board_config from EEPROM
        'override_eeprom': False, # Use the calib and board_config from host, ignoring the EEPROM data if programmed
    },
    'camera':
    {
        'rgb':
        {
            # 3840x2160, 1920x1080
            # only UHD/1080p/30 fps supported for now
            'resolution_h': 3040, # possible - 1080, 2160, 3040
            'fps': 30,
        },
        'mono':
        {
            # 1280x720, 1280x800, 640x400 (binning enabled)
            'resolution_h': 800, # possible - 400, 720, 800
            'fps': 30,
        },
    },
    'app':
    {
        'sync_video_meta_streams': False,  # Synchronize 'previewout' and 'metaout' streams
        'sync_sequence_numbers'  : False,  # Synchronize sequence numbers for all packets. Experimental
        'usb_chunk_KiB' : 64, # USB transfer chunk on device. Higher (up to megabytes) may improve throughput, or 0 to disable chunking
    },
    #'video_config':
    #{
    #    'rateCtrlMode': 'cbr', # Options: cbr / vbr
    #    'profile': 'h265_main', # Options: 'h264_baseline' / 'h264_main' / 'h264_high' / 'h265_main / 'mjpeg' '
    #    'bitrate': 8000000, # When using CBR (H264/H265 only)
    #    'maxBitrate': 8000000, # When using CBR (H264/H265 only)
    #    'keyframeFrequency': 30, (H264/H265 only)
    #    'numBFrames': 0, (H264/H265 only)
    #    'quality': 80 # (0 - 100%) When using VBR or MJPEG profile
    #}
    #'video_config':
    #{
    #    'profile': 'mjpeg',
    #    'quality': 95
    #}
}
get_available_streams() → List[str]

Return a list of all streams supported by the DepthAI library.

>>> device.get_available_streams()
['meta_d2h', 'color', 'left', 'right', 'rectified_left', 'rectified_right', 'disparity', 'depth', 'metaout', 'previewout', 'jpegout', 'video', 'object_tracker']
get_nn_to_depth_bbox_mapping()dict

Returns dict that allows to match the CNN output with the disparity info.

Since the RGB camera has a 4K resolution and the neural networks accept only images with specific resolution (like 300x300), the original image is cropped to meet the neural network requirements. On the other side, the disparity frames returned by the neural network are in full resolution available on the mono cameras.

To be able to determine where the CNN previewout image is on the disparity frame, this method should be used as it specifies the offsets and dimensions to use.

>>> device.get_nn_to_depth_bbox_mapping()
{'max_h': 681, 'max_w': 681, 'off_x': 299, 'off_y': 59}
request_af_mode()

Set the 4K RGB camera autofocus mode to one of the available AutofocusMode

request_af_trigger()

Manually send trigger action to AutoFocus on 4k RGB camera

request_jpeg()

Capture a JPEG frame from the RGB camera and send it to jpegout stream. The frame is in full available resolution, not cropped to meet the CNN input dimensions.

send_disparity_confidence_threshold(confidence: int)

Function to send disparity confidence threshold for StereoSGBM algorithm. If the disparity value confidence is below the threshold, the value is marked as invalid disparity and treated as background

send_disparity_confidence_threshold(confidence: int)

Function to send disparity confidence threshold for StereoSGBM algorithm. If the disparity value confidence is below the threshold, the value is marked as invalid disparity and treated as background

get_right_homography()

Warning

Return a 3x3 homography matrix used to rectify the right stereo camera image.

get_left_homography()

Warning

Return a 3x3 homography matrix used to rectify the left stereo camera image.

get_left_intrinsic()

Warning

Return a 3x3 intrinisc calibration matrix of the left stereo camera.

get_right_intrinsic()

Warning

Return a 3x3 intrinisc calibration matrix of the right stereo camera.

get_rotation()

Warning

Return a 3x3 rotation matrix representing the rotation of the right stereo camera w.r.t left stereo camera.

get_translation()

Warning

Return a 3x1 vector repesenting the position of the right stereo camera center w.r.t left stereo camera center.

class AutofocusMode

An enum with all autofocus modes available

Members

AF_MODE_AUTO

This mode sets the Autofocus to a manual mode, where you need to call Device.request_af_trigger() to start focusing procedure.

AF_MODE_CONTINUOUS_PICTURE

This mode adjusts the focus continually to provide the best in-focus image stream and should be used when the camera is standing still while capturing. Focusing procedure is done as fast as possible.

This is the defaut mode the DepthAI operates in.

AF_MODE_CONTINUOUS_VIDEO

This mode adjusts the focus continually to provide the best in-focus image stream and should be used when the camera is trying to capture a smooth video steam. Focusing procedure is slower and avoids focus overshoots

AF_MODE_EDOF

This mode disables the autofocus. EDOF stands for Enhanced Depth of Field and is a digital focus.

AF_MODE_MACRO

It’s the same operating mode as AF_MODE_AUTO

class CNNPipeline

Pipeline object using which the device is able to send it’s result to the host.

Methods

get_available_data_packets() → List[depthai.DataPacket]

Returns only data packets produced by the device itself, without CNN results

get_available_nnet_and_data_packets()tuple[List[depthai.NNetPacket], List[depthai.DataPacket]]

Return both neural network results and data produced by device

class NNetPacket

For any neural network inference output NNPacket.get_tensor() can be used. For the specific case of Mobilenet-SSD, YOLO-v3 decoding can be done in the firmware. Decoded objects can be accessed through getDetectedObjects() as well in addition to raw output to make the results of this commonly used networks easily accessible. See blob config file for more details about different neural network output formats and how to choose between these formats.

Neural network results packet. It’s not a single result, but a batch of results with additional metadata attached

Methods

getMetadata()depthai.FrameMetadata

Returns metadata object containing all proprietary data related to this packet

get_tensor(name: Union[int, str]) → numpy.ndarray

Warning

Works only, when in blob config file output_format is set to raw.

Returns a shaped numpy array for the specific network output tensor, based on the neural network’s output layer information.

For example: in case of Mobilenet-SSD it returns a [1, 1, 100, 7] shaped array, where numpy.dtype is float16.

Example of usage:

nnetpacket.get_tensor(0)
# or
nnetpacket.get_tensor('detection_out')
__getitem__(name: Union[int, str]) → numpy.ndarray

Same as get_tensor()

Example of usage for Mobilenet-SSD:

nnetpacket[0]
# or
nnetpacket['detection_out']
getOutputsList()list

Returns all the output tensors in a list for the network.

getOutputsDict()dict

Returns all the output tensors in a dictionary for the network. The key is the name of the output layer, the value is the shaped numpy array.

getOutputLayersInfo()depthai.TensorInfo

Returns informations about the output layers for the network.

getInputLayersInfo()depthai.TensorInfo

Returns informations about the input layers for the network.

getDetectedObjects()depthai.Detections

Warning

Works when in blob config file output_format is set to detection and with detection networks (Mobilenet-SSD, (tiny-)YOLO-v3 based networks)

Returns the detected objects in Detections format. The network is decoded on device side.

class TensorInfo

Descriptor of the input/output layers/tensors of the network.

When network is loaded the tensor info is automatically printed.

Attributes

name: str

Name of the tensor.

dimensions: list

Shape of tensor array. E.g. : [1, 1, 100, 7]

strides: list

Strides of tensor array.

data_type: string

Data type of tensor. E.g. : float16

offset: int

Offset in the raw output array.

element_size: int

Size in bytes of one element in the array.

index: int

Index of the tensor. E.g. : in case of multiple inputs/outputs in the network it marks the order of input/output.

Methods

get_dict()dict

Returns TensorInfo in a dictionary where the key is the name of attribute.

get_dimension()int

Returns the specific dimension of the tensor

tensor_info.get_dimension(depthai.TensorInfo.Dimension.WIDTH)  # returns width of tensor
class Detections

Container of neural network results decoded on device side.

Example of accessing detections

Assuming the detected objects are stored in detections object.

  • Number of detections

    detections.size()
    # or
    len(detections)
    
  • Accessing the nth detection

    detections[0]
    detections[1]  # ...
    
  • Iterating through all detections

    for detection in detections:
    
class Detection

Detected object descriptor.

Attributes

label: int

Label id of the detected object.

confidence: float

Confidence score of the detected object in interval [0, 1].

x_min: float

Top left X coordinate of the detected bounding box. Normalized, in interval [0, 1].

y_min: float

Top left Y coordinate of the detected bounding box. Normalized, in interval [0, 1].

x_max: float

Bottom right X coordinate of the detected bounding box. Normalized, in interval [0, 1].

y_max: float

Bottom right Y coordinate of the detected bounding box. Normalized, in interval [0, 1].

depth_x: float

Distance to detected bounding box on X axis. Only when depth calculation is enabled (stereo cameras are present on board).

depth_y: float

Distance to detected bounding box on Y axis. Only when depth calculation is enabled (stereo cameras are present on board).

depth_z: float

Distance to detected bounding box on Z axis. Only when depth calculation is enabled (stereo cameras are present on board).

Methods

get_dict()dict

Returns detected object in a dictionary where the key is the name of attribute.

class Dimension

Dimension descriptor of tensor shape. Mostly meaningful for input tensors since not all neural network models respect the semantics of Dimension for output tensor

Values

W / WIDTH

Width

H / HEIGHT

Height

C / CHANNEL

Number of channels

N / NUMBER

Number of inferences

B / BATCH

Batch of inferences

class DataPacket

DepthAI data packet, containing information generated on the device. Unlike NNetPacket, it contains a single “result” with source stream info

Attributes

stream_name: str

Returns packet source stream. Used to determine the origin of the packet and therefore allows to handle the packets correctly, applying proper handling based on this value

Methods

getData() → numpy.ndarray

Returns the data as NumPy array, which you can be further transformed or displayed using OpenCV imshow.

Used with streams that returns frames e.x. previewout, left, right, or encoded data e.x. video, jpegout.

getDataAsStr()str

Returns the data as a string, capable to be parsed further.

Used with streams that returns non-array results e.x. meta_d2h which returns JSON object

getMetadata()FrameMetadata

Returns metadata object containing all proprietary data related to this packet

getObjectTracker()ObjectTracker

Warning

Works only with packets from object_tracker stream

Returns metadata object containing ObjectTracker object

size()int

Returns packet data size

class FrameMetadata

Metadata object attached to the packets sent via pipeline.

Methods

getCameraName()str

Returns the name of the camera that produced the frame.

getCategory()int

Returns the type of the packet, whether it’s a regular frame or arrived from taking a still

getFrameBytesPP()int

Returns number of bytes per pixel in the packet’s frame

getFrameHeight()int

Returns the height of the packet’s frame

getFrameWidth()int

Returns the width of the packet’s frame

getFrameType()int

Returns the type of the data that this packet contains.

getInstanceNum()int

Returns the camera id that is the source of the current packet

getSequenceNum()int

Sequence number is assigned for each frame produced by the camera. It can be used to assure the frames are captured at the same time - e.x. if frames from left and right camera have the same sequence number, you can assume they were taken at the same time

getStride()int

Specifies number of bytes till the next row of pixels in the packet’s frame

getTimestamp()float

When packet is created, it is assigned a creation timestamp, which can be obtained using this method

class ObjectTracker

Object representing current state of the tracker, obtained by calling DataPacket.getObjectTracker() method on a packet from object_tracker stream

Methods

getNrTracklets()int

Return the number of available tracklets

getTracklet(tracklet_nr: int)Tracklet

Returns the tracklet with specified tracklet_nr. To check how many tracklets there are, please use getNrTracklets() method

class Tracklet

Tracklet is representing a single tracked object, is produced by ObjectTracker class. To obtain it, call ObjectTracker.getTracklet() method.

Methods

getId()int

Return the tracklet id

getLabel()int

Return the tracklet label, being the neural network returned result. Used to identify a class of recognized objects

getStatus()str

Return the tracklet status - either NEW, TRACKED, or LOST.

getLeftCoord()int

Return the left coordinate of the bounding box of a tracked object

getRightCoord()int

Return the right coordinate of the bounding box of a tracked object

getTopCoord()int

Return the top coordinate of the bounding box of a tracked object

getBottomCoord()int

Return the bottom coordinate of the bounding box of a tracked object

Got questions?

We’re always happy to help with code or other questions you might have.