DepthAI Python API

Instructions for installing, upgrading, and using the DepthAI Python API.

Supported Platforms

The DepthAI API python module is prebuilt for Ubuntu, MaxOS and Windows. For other operating systems and/or Python versions, DepthAI can be built from source.

Installing system dependencies

A couple of basic system dependencies are required to run the DepthAI library. Most of them should be already installed in most of the systems, but in case they are not, we prepared an install script that will make sure all dependencies are installed:

curl -fL | bash

If using Windows, please use this batch script for dependencies installation

Enabling the USB device (only on Linux)

Since the DepthAI is a USB device, in order to communicate with it on the systems that use udev tool, you need to add the udev rules in order to make the device accessible.

The following command will add a new udev rule to your system

echo 'SUBSYSTEM=="usb", ATTRS{idVendor}=="03e7", MODE="0666"' | sudo tee /etc/udev/rules.d/80-movidius.rules
sudo udevadm control --reload-rules && sudo udevadm trigger

Install from PyPi

Our packages are distributed via PyPi, to install it in your environment use

python3 -m pip install depthai

For other installation options, see Ohter Installation Options.

Test installation

We have depthai repository on our GitHub that contains many helpful examples and prepared neural networks you can use to make your prototyping faster. It also includes the test script, maintained by our contributors, that should help you verify if your setup was correct.

First, clone the depthai repository and install its dependencies

git clone
cd depthai
python3 -m pip install -r requirements.txt

Now, run the demo script from within depthai to make sure everything is working:


If all goes well a small window video display with overlays for any items for which the class exists in the example 20-class object detector (class list here).

API Reference


Represents the DepthAI device with the methods to interact with it.

NOTE: Please be aware that all methods except get_available_streams require create_pipeline to be run first,


import depthai
device = depthai.Device('', False)
pipeline = device.create_pipeline(config={
    'streams': ['previewout', 'metaout'],
    'ai': {
        "blob_file": "/path/to/model.blob",
        "blob_file_config": "/path/to/config.json",


  • __init__(device_id: str, usb2_mode: bool) -> Device

    Standard and recomended way to set up the object.

    device_id represents the USB port id that the device is connected to. If set to specific value (e.x. "1") it will look for the device in specific USB port, whereas if left empty - '' - it will look for the device on all ports. It’s useful when we have more than one DepthAI devices connected and want to specify which one to use in the code

    usb2_mode, being True/False, allows the DepthAI to communicate using USB2 protocol, not USB3. This lowers the throughput of the pipeline, but allows to use >1m USB cables for connection

  • __init__(cmd_file: str, device_id: str) -> Device

    Development and debug way to initialize the DepthAI device.

    cmd_file is a path to firmware .cmd file that will be loaded onto the device for boot.

    device_id represents the USB port id that the device is connected to. If set to specific value (e.x. "1") it will look for the device in specific USB port, whereas if left empty - '' - it will look for the device on all ports.
    It’s useful when we have more than one DepthAI devices connected and want to specify which one to use in the code

  • create_pipeline(config: dict) -> CNNPipeline

    Initializes a DepthAI Pipeline, returning the created CNNPipeline if successful and None otherwise.

    config(dict) - A dict of pipeline configuration settings. Example key/values for the config:

          # Possible streams:
          #   'color' - 4K color camera preview
          #   'left' - left mono camera preview
          #   'right' - right mono camera preview
          #   'rectified_left' - rectified left camera preview
          #   'rectified_right' - rectified right camera preview
          #   'previewout' - neural network input preview
          #   'metaout' - CNN output tensors
          #   'depth' - the raw depth map, disparity converted to real life distance
          #   'disparity' - disparity map, the diaparity between left and right cameras, in pixels
          #   'disparity_color' - disparity map colorized
          #   'meta_d2h' - device metadata stream
          #   'video' - H.264/H.265 encoded color camera frames
          #   'jpegout' - JPEG encoded color camera frames
          #   'object_tracker' - Object tracker results
          'streams': [
              'left',  # if left is used, it must be in the first position
              {'name': 'previewout', 'max_fps': 12.0},  # streams can be specified as objects with additional params
              # depth-related streams
              {'name': 'depth', 'max_fps': 12.0},
              {'name': 'disparity', 'max_fps': 12.0},
              {'name': 'disparity_color', 'max_fps': 12.0},
              'calibration_file': consts.resource_paths.calib_fpath,
              'padding_factor': 0.3,
              'depth_limit_m': 10.0, # In meters, for filtering purpose during x,y,z calc
              'confidence_threshold' : 0.5, #Depth is calculated for bounding boxes with confidence higher than this number 
              'blob_file': blob_file,  # MyriadX CNN blob file path
              'blob_file_config': blob_file_config,  # Configuration file for CNN output tensor mapping on host side
              'calc_dist_to_bb': True,  # if True, will include depth information to CNN output tensor
              'keep_aspect_ratio': not args['full_fov_nn'],
          # object tracker
              'max_tracklets'        : 20, # maximum 20 is supported
              'confidence_threshold' : 0.5, # object is tracked only for detections over this threshold
              'swap_left_and_right_cameras': args['swap_lr'], # True for 1097 (RPi Compute) and 1098OBC (USB w/onboard cameras)
              'left_fov_deg': args['field_of_view'], # Same on 1097 and 1098OBC
              'rgb_fov_deg': args['rgb_field_of_view'],
              'left_to_right_distance_cm': args['baseline'], # Distance between stereo cameras
              'left_to_rgb_distance_cm': args['rgb_baseline'], # Currently unused
              'store_to_eeprom': args['store_eeprom'],
              'clear_eeprom': args['clear_eeprom'],
              'override_eeprom': args['override_eeprom'],
          #    'rateCtrlMode': 'cbr',
          #    'profile': 'h265_main', # Options: 'h264_baseline' / 'h264_main' / 'h264_high' / 'h265_main'
          #    'bitrate': 8000000, # When using CBR
          #    'maxBitrate': 8000000, # When using CBR
          #    'keyframeFrequency': 30,
          #    'numBFrames': 0,
          #    'quality': 80 # (0 - 100%) When using VBR
  • get_available_streams() -> List[str]

    Return a list of all streams supported by the DepthAI library.

      >>> device.get_available_streams()
      ['meta_d2h', 'color', 'left', 'right', 'rectified_left', 'rectified_right', 'disparity', 'depth', 'metaout', 'previewout', 'jpegout', 'video', 'object_tracker']
  • get_nn_to_depth_bbox_mapping() -> dict

    Returns dict that allows to match the CNN output with the disparity info.

    Since the RGB camera has a 4K resolution and the neural networks accept only images with specific resolution (like 300x300), the original image is cropped to meet the neural network requirements. On the other side, the disparity frames returned by the neural network are in full resolution available on the mono cameras.

    To be able to determine where the CNN previewout image is on the disparity frame, this method should be used as it specifies the offsets and dimensions to use.

      >>> device.get_nn_to_depth_bbox_mapping()
      {'max_h': 681, 'max_w': 681, 'off_x': 299, 'off_y': 59}
  • request_jpeg()

    Capture a JPEG frame from the RGB camera and send it to jpegout stream. The frame is in full available resolution, not cropped to meet the CNN input dimensions.

  • send_disparity_confidence_threshold()

    Function to send disparity confidence threshold for StereoSGBM algorithm. If the disparity value confidence is below the threshold, the value is marked as invalid disparity and treated as background

  • get_right_homography()

    Return a 3x3 homography matrix used to rectify the right stereo camera image.

  • get_left_homography()

    Return a 3x3 homography matrix used to rectify the left stereo camera image.

  • get_left_intrinsic()

    Return a 3x3 intrinisc calibration matrix of the left stereo camera.

  • get_rotation()

    Return a 3x3 rotation matrix representing the rotation of the right stereo camera w.r.t left stereo camera.

  • get_translation()

    Return a 3x1 vector repesenting the position of the right stereo camera center w.r.t left stereo camera center.


An enum with all autofocus modes available


  • AF_MODE_CONTINUOUS_PICTURE This mode adjusts the focus continually to provide the best in-focus image stream and should be used when the camera is standing still while capturing. Focusing procedure is done as fast as possible.

    This is the defaut mode the DepthAI operates in.

  • AF_MODE_CONTINUOUS_VIDEO This mode adjusts the focus continually to provide the best in-focus image stream and should be used when the camera is trying to capture a smooth video steam. Focusing procedure is slower and avoids focus overshoots
  • AF_MODE_EDOF This mode disables the autofocus. EDOF stands for Enhanced Depth of Field and is a digital focus.


Pipeline object using which the device is able to send it’s result to the host. Created using [depthai.create_pipeline]



For any neural network inference output get_tensor can be used. For the specific case of Mobilenet-SSD, YOLO-v3 decoding can be done in the firmware. Decoded objects can be accessed through getDetectedObjects as well in addition to raw output to make the results of this commonly used networks easily accessible. See blob config file for more details about different neural network output formats and how to choose between these formats.

Neural network results packet. It’s not a single result, but a batch of results with additional metadata attached


  • get_tensor(Union[int, str]) -> numpy.ndarray

    Can be used ONLY when in blob config file output_format is set to raw. It returns a shaped numpy array for the specific network output tensor, based on the neural network’s output layer information.

    For example: in case of Mobilenet-SSD it returns a [1, 1, 100, 7] shaped array, where numpy.dtype is float16.

    Example of usage: nnetpacket.get_tensor(0) or nnetpacket.get_tensor('detection_out')

  • __getitem__(Union[int, str] -> numpy.ndarray

    Same as get_tensor.

    Example of usage for Mobilenet-SSD:

    nnetpacket[0] or nnetpacket['detection_out'], where 'detection_out' is the name of output layer in case of Mobilenet-SSD

  • getOutputsList() -> list

    Returns all the output tensors in a list for the network.

  • getOutputsDict() -> dict

    Returns all the output tensors in a dictionary for the network. The key is the name of the output layer, the value is the shaped numpy array.


Descriptor of the input/output layers/tensors of the network.

When network is loaded the tensor info is automatically printed.

Can be printed using : print(nnetpacket.getInputLayersInfo()) or print(nnetpacket.getOutputLayersInfo()) at runtime.


  • name -> string

    Name of the tensor.

  • dimensions -> list

    Shape of tensor array. E.g. : [1, 1, 100, 7]

  • strides -> list

    Strides of tensor array.

  • data_type -> string

    Data type of tensor. E.g. : float16

  • offset -> int

    Offset in the raw output array.

  • element_size -> int

    Size in bytes of one element in the array.

  • index -> int

    Index of the tensor. E.g. : in case of multiple inputs/outputs in the network it marks the order of input/output.


  • get_dict() -> dict

    Returns TensorInfo in a dictionary where the key is the name of attribute.

  • get_dimension(Dimension) -> int

    Returns the specific dimension of the tensor, for example: tensor_info.get_dimension(depthai.TensorInfo.Dimension.WIDTH) returns the WIDTH of tensor.


Container of neural network results decoded on device side.

Example of accessing detections

Assuming the detected objects are stored in detections object.

  • Number of detections

    detections.size() or len(detections)

  • Accessing the x-th detection


  • Iterating through all detections

    for detection in detections:

    handle detection


Detected object descriptor.


  • label -> int

    Label id of the detected object.

  • confidence -> float

    Confidence score of the detected object in interval [0, 1].

  • x_min -> float

    Top left X coordinate of the detected bounding box. Normalized, in interval [0, 1].

  • y_min -> float

    Top left Y coordinate of the detected bounding box. Normalized, in interval [0, 1].

  • x_max -> float

    Bottom right X coordinate of the detected bounding box. Normalized, in interval [0, 1].

  • y_max -> float

    Bottom right Y coordinate of the detected bounding box. Normalized, in interval [0, 1].

  • depth_x -> float

    Distance to detected bounding box on X axis. Only when depth calculation is enabled (stereo cameras are present on board).

  • depth_y -> float

    Distance to detected bounding box on Y axis. Only when depth calculation is enabled (stereo cameras are present on board).

  • depth_z -> float

    Distance to detected bounding box on Z axis. Only when depth calculation is enabled (stereo cameras are present on board).


  • get_dict() -> dict

    Returns detected object in a dictionary where the key is the name of attribute.


Dimension descriptor of tensor shape.


  • Union[W, WIDTH] -> Width
  • Union[H, HEIGHT] -> Height
  • Union[C, CHANNEL] -> Number of channels
  • Union[N, B, NUMBER, BATCH] -> Number/Batch of inferences

Note: Dimension is mostly meaningful for input tensors since not all neural network models respect the semantics of Dimension for output tensor. E.g. Width might not mean Width.


DepthAI data packet, containing information generated on the device. Unlike NNetPacket, it contains a single “result” with source stream info


  • stream_name: str

    Returns packet source stream. Used to determine the origin of the packet and therefore allows to handle the packets correctly, applying proper handling based on this value


  • getData() -> numpy.ndarray

    Returns the data as NumPy array, which you can e.x. display the data using OpenCV imshow.

    Used with streams that returns frames e.x. previewout, left, right, or encoded data e.x. video, jpegout.

  • getDataAsStr() -> str

    Returns the data as a string, capable to be parsed further.

    Used with streams that returns non-array results e.x. meta_d2h which returns JSON object

  • getObjectTracker() -> ObjectTracker

    Returns result as an ObjectTracker instance, used only with packets from object_tracker stream

  • size() -> int

    Returns packet data size


Metadata object attached to the packets sent via pipeline.


  • getCameraName() -> str

    Returns the name of the camera that produced the frame.

  • getCategory() -> int

    Returns the type of the packet, whether it’s a regular frame or arrived from taking a still

  • getFrameBytesPP() -> int

    Returns number of bytes per pixel in the packet’s frame

  • getFrameWidth() -> int

    Returns the width of the packet’s frame

  • getFrameType() -> int

    Returns the type of the data that this packet contains.

  • getInstanceNum() -> int

    Returns the camera id that is the source of the current packet

  • getSequenceNum() -> int

    Sequence number is assigned for each frame produced by the camera. It can be used to assure the frames are captured at the same time - e.x. if frames from left and right camera have the same sequence number, you can assume they were taken at the same time

  • getStride() -> int

    Specifies number of bytes till the next row of pixels in the packet’s frame

  • getTimestamp() -> float

    When packet is created, it is assigned a creation timestamp, which can be obtained using this method


Object representing current state of the tracker, obtained by calling getObjectTracker method on a packet from object_tracker stream



Tracklet is representing a single tracked object, is produced by ObjectTracker class. To obtain it, call getTracklet method.


  • getId() -> int

    Return the tracklet id

  • getLabel() -> int

    Return the tracklet label, being the neural network returned result. Used to identify a class of recognized objects

  • getLabel() -> str

    Return the tracklet status - either NEW, TRACKED, or LOST.

  • getLeftCoord() -> int

    Return the left coordinate of the bounding box of a tracked object

  • getRightCoord() -> int

    Return the right coordinate of the bounding box of a tracked object

  • getTopCoord() -> int

    Return the top coordinate of the bounding box of a tracked object

  • getBottomCoord() -> int

    Return the bottom coordinate of the bounding box of a tracked object

Preparing MyriadX blob file and it’s config

As you can see in this example, basic usage of create_pipeline method consists of specifying desired output streams and AI section, where you specify MyriadX blob and it’s config.

In this section, we’ll describe how to obtain both blob_file and blob_file_config.

Obtaining MyriadX blob

Since we’re utilizing MyriadX VPU, your model needs to be compiled (or accurately - optimized and converted) into the MyriadX blob file, which will be sent to the device and executed.

Easiest way to obtain this blob is to use our online BlobConverter app. It has all tools needed for compilation so you don’t need to setup anything - and you can even download a blob for the model from OpenVINO model zoo

If you’d like, you can also compile the blob yourself. You’ll need to install OpenVINO toolkit, then use Model Optimizer and Myriad Compiler in order to obtain MyriadX blob. We’ve documented example usage of these compilers here

Creating Blob configuration file

If config file is not provided then there is no decoding done on device => output_format is set to raw. The decoding must be done on host side, by the user.

Currently there is support to decode Mobilenet-SSD and (tiny-)YOLO-v3 based networks on the device. For that config file is required with network specific parameters.

Example for tiny-yolo-v3 network:

        "output_format" : "detection",
        "NN_family" : "YOLO",
        "NN_specific_metadata" :
            "classes" : 80,
            "coordinates" : 4,
            "anchors" : [10,14, 23,27, 37,58, 81,82, 135,169, 344,319],
            "anchor_masks" : 
                "side26" : [1,2,3],
                "side13" : [3,4,5]
            "iou_threshold" : 0.5,
            "confidence_threshold" : 0.5
  • NN_config - configuration for the network
    • output_format
      • "detection" - decoding done on device, the received packet is in Detections format
      • "raw" - decoding done on host
    • NN_family - "YOLO" or "mobilenet
    • NN_specific_metadata - only for "YOLO"
      • classes - number of classes
      • coordinates - number of coordinates
      • anchors - anchors for YOLO network
      • anchor_masks - anchor mask for each output layer : 26x26, 13x13 (+ 52x52 for full YOLO-v3)
      • iou_threshold - intersection over union threshold for detected object
      • confidence_threshold - score confidence threshold for detected object
  • mappings
    • labels - label mapping for detected object ID

Example decoding for tiny-yolo-v3, yolo-v3, mobilenet-ssd when output_format is set to detection:

nnet_packets, data_packets = p.get_available_nnet_and_data_packets(blocking=True)

in_layers = nnet_packet.getInputLayersInfo() #get input layer information
# print(in_layers) #print input layer info for debugging
input_width  = in_layers[0].get_dimension(depthai.TensorInfo.Dimension.W) #width of input image
input_height = in_layers[0].get_dimension(depthai.TensorInfo.Dimension.H) #height of input image

detections = nnet_packet.getDetectedObjects() #get detection container
objects = list() #create empty list of filtered objects

for detection in detections:
    detection_dict = detection.get_dict()
    # scale normalized coordinates to image coordinates
    detection_dict["x_min"] = int(detection_dict["x_min"] * input_width)
    detection_dict["y_min"] = int(detection_dict["y_min"] * input_height)
    detection_dict["x_max"] = int(detection_dict["x_max"] * input_width)
    detection_dict["y_max"] = int(detection_dict["y_max"] * input_height)

return objects

Example of decoding for full yolo-v3 and tiny-yolo-v3 on host and device

Example of decoding for mobilenet based networks on host and device

Other installation methods

To get the latest and yet unreleased features from our source code, you can go ahead and compile depthai package manually.

Dependencies to build from source

  • CMake > 3.2.0
  • Generation tool (Ninja, make, …)
  • C/C++ compiler
  • libusb1 development package

Ubuntu, Raspberry Pi OS, … (Debian based systems)

On Debian based systems (Raspberyy Pi OS, Ubuntu, …) these can be acquired by running:

sudo apt-get -y install cmake libusb-1.0-0-dev build-essential

macOS (Mac OS X)

Assuming a stock Mac OS X install, depthai-python library needs following dependencies

  • HomeBrew (If it’s not installed already)
    /bin/bash -c "$(curl -fsSL" 
  • Python, libusb, CMake, wget
    brew install coreutils python3 cmake libusb wget

    And now you’re ready to clone the depthai-python from Github and build it for Mac OS X.

You can install them all with the following command (if using Ubuntu)

Install using GitHub commit

Pip allows users to install the packages from specific commits, even if they are not yet released on PyPi.

To do so, use the command below - and be sure to replace the <commit_sha> with the correct commit hash from here

python3 -m pip install git+<commit_sha>

Using/Testing a Specific Branch/PR

From time to time, it may be of interest to use a specific branch. This may occur, for example, because we have listened to your feature request and implemented a quick implementation in a branch. Or it could be to get early access to a feature that is soaking in our develop for stability purposes before being merged into main.

So when working in the depthai repository, using a branch can be accomplished with the following commands. For this example, the branch that we will try out is develop (which is the branch we use to soak new features before merging them into main):

Prior to running the following, you can either clone the respository independently (for not over-writing any of your local changes) or simply do a git pull first.

git checkout develop
python3 -m pip install -U pip
python3 -m pip install -r requirements.txt

Install from source

If desired, you can also install the package from the source code itself - it will allow you to make the changes to the API and see them live in action.

To do so, first download the repository and then add the package to your python interpreter in development mode

git clone
cd depthai-python
git submodule update --init --recursive
python3 develop  # you may need to add sudo if using system interpreter instead of virtual environment

If you want to use other branch (e.g. develop) than default (main), you can do so by typing

git checkout develop  # replace the "develop" with a desired branch name
git submodule update --recursive
python3 develop

Or, if you want to checkout a specific commit, type

git checkout <commit_sha>
git submodule update --recursive
python3 develop