StereoDepth

Stereo depth node calculates the disparity/depth from two mono cameras.

How to place it

pipeline = dai.Pipeline()
stereo = pipeline.create(dai.node.StereoDepth)
dai::Pipeline pipeline;
auto stereo = pipeline.create<dai::node::StereoDepth>();

Inputs and Outputs

               ┌───────────────────┐
               │                   │ confidenceMap
               │                   ├─────────────►
               │                   │rectifiedLeft
               │                   ├─────────────►
left           │                   │   syncedLeft
──────────────►│-------------------├─────────────►
               │                   │        depth
               │                   ├─────────────►
               │    StereoDepth    │    disparity
               │                   ├─────────────►
right          │                   │   syncedRight
──────────────►│-------------------├─────────────►
               │                   │rectifiedRight
               │                   ├─────────────►
inputConfig    │                   |     outConfig
──────────────►│-------------------├─────────────►
               └───────────────────┘

Internal block diagram of StereoDepth node

../../../_images/depth_diagram.jpeg

On the diagram, red rectangle are firmware settings that are configurable via the API. Gray rectangles are settings that that are not yet exposed to the API. We plan on exposing as much configurability as possible, but please inform us if you would like to see these settings configurable sooner.

If you click on the image, you will be redirected to the webapp. Some blocks have notes that provide additional technical information.

Currently configurable blocks

Left-Right Check or LR-Check is used to remove incorrectly calculated disparity pixels due to occlusions at object borders (Left and Right camera views are slightly different).

  1. Computes disparity by matching in R->L direction

  2. Computes disparity by matching in L->R direction

  3. Combines results from 1 and 2, running on Shave: each pixel d = disparity_LR(x,y) is compared with disparity_RL(x-d,y). If the difference is above a threshold, the pixel at (x,y) in the final disparity map is invalidated.

You can use debugDispLrCheckIt1 and debugDispLrCheckIt2 debug outputs for debugging/fine-tuning purposes.

Extended disparity mode allows detecting closer distance objects for the given baseline. This increases the maximum disparity search from 96 to 191, meaning the range is now: [0..190]. So this cuts the minimum perceivable distance in half, given that the minimum distance is now focal_length * base_line_dist / 190 instead of focal_length * base_line_dist / 95.

  1. Computes disparity on the original size images (e.g. 1280x720)

  2. Computes disparity on 2x downscaled images (e.g. 640x360)

  3. Combines the two level disparities on Shave, effectively covering a total disparity range of 191 pixels (in relation to the original resolution).

You can use debugExtDispLrCheckIt1 and debugExtDispLrCheckIt2 debug outputs for debugging/fine-tuning purposes.

Subpixel mode improves the precision and is especially useful for long range measurements. It also helps for better estimating surface normals.

Besides the integer disparity output, the Stereo engine is programmed to dump to memory the cost volume, that is 96 levels (disparities) per pixel, then software interpolation is done on Shave, resulting a final disparity with 3 fractional bits, resulting in significantly more granular depth steps (8 additional steps between the integer-pixel depth steps), and also theoretically, longer-distance depth viewing - as the maximum depth is no longer limited by a feature being a full integer pixel-step apart, but rather 1/8 of a pixel. In this mode, stereo cameras perform: 94 depth steps * 8 subpixel depth steps + 2 (min/max values) = 754 depth steps Note that Subpixel and Extended Disparity are not yet supported simultaneously.

For comparison of normal disparity vs. subpixel disparity images, click here.

Mesh files are generated using the camera intrinsics, distortion coeffs, and rectification rotations. These files helps in overcoming the distortions in the camera increasing the accuracy and also help in when wide FOV lens are used.

Note

Currently mesh files are generated only for stereo cameras on the host during calibration. The generated mesh files are stored in depthai/resources which users can load to the device. This process will be moved to on device in the upcoming releases.

void dai::node::StereoDepth::loadMeshFiles(const std::string &pathLeft, const std::string &pathRight)

Specify local filesystem paths to the mesh calibration files for ‘left’ and ‘right’ inputs.

When a mesh calibration is set, it overrides the camera intrinsics/extrinsics matrices. Mesh format: a sequence of (y,x) points as ‘float’ with coordinates from the input image to be mapped in the output. The mesh can be subsampled, configured by setMeshStep.

With a 1280x800 resolution and the default (16,16) step, the required mesh size is:

width: 1280 / 16 + 1 = 81

height: 800 / 16 + 1 = 51

void dai::node::StereoDepth::loadMeshData(const std::vector<std::uint8_t> &dataLeft, const std::vector<std::uint8_t> &dataRight)

Specify mesh calibration data for ‘left’ and ‘right’ inputs, as vectors of bytes. See loadMeshFiles for the expected data format

void dai::node::StereoDepth::setMeshStep(int width, int height)

Set the distance between mesh points. Default: (16, 16)

  • Confidence threshold: Stereo depth algorithm searches for the matching feature from right camera point to the left image (along the 96 disparity levels). During this process it computes the cost for each disparity level and choses the minimal cost between two disparities and uses it to compute the confidence at each pixel. Stereo node will output disparity/depth pixels only where depth confidence is below the confidence threshold (lower the confidence value means better depth accuracy). Note: This threshold only applies to Normal stereo mode as of now.

  • LR check threshold: Disparity is considered for the output when the difference between LR and RL disparities is smaller than the LR check threshold.

void dai::StereoDepthConfig::setConfidenceThreshold(int confThr)

Confidence threshold for disparity calculation

Parameters
  • confThr: Confidence threshold value 0..255

void dai::StereoDepthConfig::setLeftRightCheckThreshold(int threshold)

Parameters
  • threshold: Set threshold for left-right, right-left disparity map combine, 0..255

Current limitations

  • Median filtering is disabled when subpixel mode is set to 4 or 5 bits.

Stereo depth FPS

Stereo depth mode

FPS for 720P

Standard mode

150

Left-Right Check

60

Subpixel Disparity

30

Extended Disparity

60

Subpixel + LR check

15

Extended + LR check

30

Usage

pipeline = dai.Pipeline()
stereo = pipeline.create(dai.node.StereoDepth)

# Better handling for occlusions:
stereo.setLeftRightCheck(False)
# Closer-in minimum depth, disparity range is doubled:
stereo.setExtendedDisparity(False)
# Better accuracy for longer distance, fractional disparity 32-levels:
stereo.setSubpixel(False)

# Define and configure MonoCamera nodes beforehand
left.out.link(stereo.left)
right.out.link(stereo.right)
dai::Pipeline pipeline;
auto stereo = pipeline.create<dai::node::StereoDepth>();

// Better handling for occlusions:
stereo->setLeftRightCheck(false);
// Closer-in minimum depth, disparity range is doubled:
stereo->setExtendedDisparity(false);
// Better accuracy for longer distance, fractional disparity 32-levels:
stereo->setSubpixel(false);

// Define and configure MonoCamera nodes beforehand
left->out.link(stereo->left);
right->out.link(stereo->right);

Reference

class depthai.node.StereoDepth

StereoDepth node. Compute stereo disparity and depth from left-right image pair.

class Connection

Connection between an Input and Output

class Id

Node identificator. Unique for every node on a single Pipeline

Properties

alias of depthai.StereoDepthProperties

property confidenceMap

Outputs ImgFrame message that carries RAW8 confidence map. Lower values means higher confidence of the calculated disparity value. RGB aligment, left-right check or any postproccessing (e.g. median filter) is not performed on confidence map.

property debugDispCostDump

Outputs ImgFrame message that carries cost dump of disparity map. Useful for debugging/fine tuning.

property debugDispLrCheckIt1

Outputs ImgFrame message that carries left-right check first iteration (before combining with second iteration) disparity map. Useful for debugging/fine tuning.

property debugDispLrCheckIt2

Outputs ImgFrame message that carries left-right check second iteration (before combining with first iteration) disparity map. Useful for debugging/fine tuning.

property debugExtDispLrCheckIt1

Outputs ImgFrame message that carries extended left-right check first iteration (downscaled frame, before combining with second iteration) disparity map. Useful for debugging/fine tuning.

property debugExtDispLrCheckIt2

Outputs ImgFrame message that carries extended left-right check second iteration (downscaled frame, before combining with first iteration) disparity map. Useful for debugging/fine tuning.

property depth

Outputs ImgFrame message that carries RAW16 encoded (0..65535) depth data in millimeters.

Non-determined / invalid depth values are set to 0

property disparity

RAW8 encoded (0..95) for standard mode; RAW8 encoded (0..190) for extended disparity mode; RAW16 encoded (0..3040) for subpixel disparity mode (32 subpixel levels on top of standard mode).

Type

Outputs ImgFrame message that carries RAW8 / RAW16 encoded disparity data

getAssetManager(*args, **kwargs)

Overloaded function.

  1. getAssetManager(self: depthai.Node) -> depthai.AssetManager

Get node AssetManager as a const reference

  1. getAssetManager(self: depthai.Node) -> depthai.AssetManager

Get node AssetManager as a const reference

getInputRefs(*args, **kwargs)

Overloaded function.

  1. getInputRefs(self: depthai.Node) -> List[depthai.Node.Input]

Retrieves reference to node inputs

  1. getInputRefs(self: depthai.Node) -> List[depthai.Node.Input]

Retrieves reference to node inputs

getInputs(self: depthai.Node) → List[depthai.Node.Input]

Retrieves all nodes inputs

getMaxDisparity(self: depthai.node.StereoDepth)float

Useful for normalization of the disparity map.

Returns

Maximum disparity value that the node can return

getName(self: depthai.Node)str

Retrieves nodes name

getOutputRefs(*args, **kwargs)

Overloaded function.

  1. getOutputRefs(self: depthai.Node) -> List[depthai.Node.Output]

Retrieves reference to node outputs

  1. getOutputRefs(self: depthai.Node) -> List[depthai.Node.Output]

Retrieves reference to node outputs

getOutputs(self: depthai.Node) → List[depthai.Node.Output]

Retrieves all nodes outputs

getParentPipeline(*args, **kwargs)

Overloaded function.

  1. getParentPipeline(self: depthai.Node) -> depthai.Pipeline

  2. getParentPipeline(self: depthai.Node) -> depthai.Pipeline

property id

Id of node

property initialConfig

Initial config to use for StereoDepth.

property inputConfig

Input StereoDepthConfig message with ability to modify parameters in runtime. Default queue is non-blocking with size 4.

property left

Input for left ImgFrame of left-right pair

Default queue is non-blocking with size 8

loadCalibrationData(self: depthai.node.StereoDepth, arg0: List[int])None
loadCalibrationFile(self: depthai.node.StereoDepth, arg0: str)None
loadMeshData(self: depthai.node.StereoDepth, dataLeft: List[int], dataRight: List[int])None

Specify mesh calibration data for ‘left’ and ‘right’ inputs, as vectors of bytes. See loadMeshFiles for the expected data format

loadMeshFiles(self: depthai.node.StereoDepth, pathLeft: str, pathRight: str)None

Specify local filesystem paths to the mesh calibration files for ‘left’ and ‘right’ inputs.

When a mesh calibration is set, it overrides the camera intrinsics/extrinsics matrices. Mesh format: a sequence of (y,x) points as ‘float’ with coordinates from the input image to be mapped in the output. The mesh can be subsampled, configured by setMeshStep.

With a 1280x800 resolution and the default (16,16) step, the required mesh size is:

width: 1280 / 16 + 1 = 81

height: 800 / 16 + 1 = 51

property outConfig

Outputs StereoDepthConfig message that contains current stereo configuration.

property rectifiedLeft

Outputs ImgFrame message that carries RAW8 encoded (grayscale) rectified frame data.

property rectifiedRight

Outputs ImgFrame message that carries RAW8 encoded (grayscale) rectified frame data.

property right

Input for right ImgFrame of left-right pair

Default queue is non-blocking with size 8

setConfidenceThreshold(self: depthai.node.StereoDepth, arg0: int)None

Confidence threshold for disparity calculation

Parameter confThr:

Confidence threshold value 0..255

setDepthAlign(*args, **kwargs)

Overloaded function.

  1. setDepthAlign(self: depthai.node.StereoDepth, align: dai::StereoDepthProperties::DepthAlign) -> None

Parameter align:

Set the disparity/depth alignment: centered (between the ‘left’ and ‘right’ inputs), or from the perspective of a rectified output stream

  1. setDepthAlign(self: depthai.node.StereoDepth, camera: depthai.CameraBoardSocket) -> None

Parameter camera:

Set the camera from whose perspective the disparity/depth will be aligned

setEmptyCalibration(self: depthai.node.StereoDepth)None

Specify that a passthrough/dummy calibration should be used, when input frames are already rectified (e.g. sourced from recordings on the host)

setExtendedDisparity(self: depthai.node.StereoDepth, enable: bool)None

Disparity range increased from 0-95 to 0-190, combined from full resolution and downscaled images.

Suitable for short range objects. Currently incompatible with sub-pixel disparity

setInputResolution(*args, **kwargs)

Overloaded function.

  1. setInputResolution(self: depthai.node.StereoDepth, width: int, height: int) -> None

Specify input resolution size

Optional if MonoCamera exists, otherwise necessary

  1. setInputResolution(self: depthai.node.StereoDepth, resolution: Tuple[int, int]) -> None

Specify input resolution size

Optional if MonoCamera exists, otherwise necessary

setLeftRightCheck(self: depthai.node.StereoDepth, enable: bool)None

Computes and combines disparities in both L-R and R-L directions, and combine them.

For better occlusion handling, discarding invalid disparity values

setMedianFilter(self: depthai.node.StereoDepth, arg0: depthai.MedianFilter)None
Parameter median:

Set kernel size for disparity/depth median filtering, or disable

setMeshStep(self: depthai.node.StereoDepth, width: int, height: int)None

Set the distance between mesh points. Default: (16, 16)

setNumFramesPool(self: depthai.node.StereoDepth, arg0: int)None

Specify number of frames in pool.

Parameter numFramesPool:

How many frames should the pool have

setOutputDepth(self: depthai.node.StereoDepth, arg0: bool)None
setOutputKeepAspectRatio(self: depthai.node.StereoDepth, keep: bool)None

Specifies whether the frames resized by setOutputSize should preserve aspect ratio, with potential cropping when enabled. Default true

setOutputRectified(self: depthai.node.StereoDepth, arg0: bool)None
setOutputSize(self: depthai.node.StereoDepth, width: int, height: int)None

Specify disparity/depth output resolution size, implemented by scaling.

Currently only applicable when aligning to RGB camera

setRectification(self: depthai.node.StereoDepth, enable: bool)None

Rectify input images or not.

setRectifyEdgeFillColor(self: depthai.node.StereoDepth, color: int)None

Fill color for missing data at frame edges

Parameter color:

Grayscale 0..255, or -1 to replicate pixels

setRectifyMirrorFrame(self: depthai.node.StereoDepth, arg0: bool)None

DEPRECATED function. It was removed, since rectified images are not flipped anymore. Mirror rectified frames, only when LR-check mode is disabled. Default true. The mirroring is required to have a normal non-mirrored disparity/depth output.

A side effect of this option is disparity alignment to the perspective of left or right input: false: mapped to left and mirrored, true: mapped to right. With LR-check enabled, this option is ignored, none of the outputs are mirrored, and disparity is mapped to right.

Parameter enable:

True for normal disparity/depth, otherwise mirrored

setRuntimeModeSwitch(self: depthai.node.StereoDepth, arg0: bool)None

Enable runtime stereo mode switch, e.g. from standard to LR-check. Note: when enabled resources allocated for worst case to enable switching to any mode.

setSubpixel(self: depthai.node.StereoDepth, enable: bool)None

Computes disparity with sub-pixel interpolation (5 fractional bits).

Suitable for long range. Currently incompatible with extended disparity

property syncedLeft

Passthrough ImgFrame message from ‘left’ Input.

property syncedRight

Passthrough ImgFrame message from ‘right’ Input.

class dai::node::StereoDepth : public dai::Node

StereoDepth node. Compute stereo disparity and depth from left-right image pair.

Public Types

using Properties = dai::StereoDepthProperties

Public Functions

std::string getName() const override

Retrieves nodes name.

StereoDepth(const std::shared_ptr<PipelineImpl> &par, int64_t nodeId)
void loadCalibrationFile(const std::string &path)

Specify local filesystem path to the calibration file

Parameters
  • path: Path to calibration file. If empty use EEPROM

void loadCalibrationData(const std::vector<std::uint8_t> &data)

Specify calibration data as a vector of bytes

Parameters
  • path: Calibration data. If empty use EEPROM

void setEmptyCalibration()

Specify that a passthrough/dummy calibration should be used, when input frames are already rectified (e.g. sourced from recordings on the host)

void loadMeshFiles(const std::string &pathLeft, const std::string &pathRight)

Specify local filesystem paths to the mesh calibration files for ‘left’ and ‘right’ inputs.

When a mesh calibration is set, it overrides the camera intrinsics/extrinsics matrices. Mesh format: a sequence of (y,x) points as ‘float’ with coordinates from the input image to be mapped in the output. The mesh can be subsampled, configured by setMeshStep.

With a 1280x800 resolution and the default (16,16) step, the required mesh size is:

width: 1280 / 16 + 1 = 81

height: 800 / 16 + 1 = 51

void loadMeshData(const std::vector<std::uint8_t> &dataLeft, const std::vector<std::uint8_t> &dataRight)

Specify mesh calibration data for ‘left’ and ‘right’ inputs, as vectors of bytes. See loadMeshFiles for the expected data format

void setMeshStep(int width, int height)

Set the distance between mesh points. Default: (16, 16)

void setInputResolution(int width, int height)

Specify input resolution size

Optional if MonoCamera exists, otherwise necessary

void setInputResolution(std::tuple<int, int> resolution)

Specify input resolution size

Optional if MonoCamera exists, otherwise necessary

void setOutputSize(int width, int height)

Specify disparity/depth output resolution size, implemented by scaling.

Currently only applicable when aligning to RGB camera

void setOutputKeepAspectRatio(bool keep)

Specifies whether the frames resized by setOutputSize should preserve aspect ratio, with potential cropping when enabled. Default true

void setMedianFilter(dai::MedianFilter median)

Parameters
  • median: Set kernel size for disparity/depth median filtering, or disable

void setDepthAlign(Properties::DepthAlign align)

Parameters
  • align: Set the disparity/depth alignment: centered (between the ‘left’ and ‘right’ inputs), or from the perspective of a rectified output stream

void setDepthAlign(CameraBoardSocket camera)

Parameters
  • camera: Set the camera from whose perspective the disparity/depth will be aligned

void setConfidenceThreshold(int confThr)

Confidence threshold for disparity calculation

Parameters
  • confThr: Confidence threshold value 0..255

void setRectification(bool enable)

Rectify input images or not.

void setLeftRightCheck(bool enable)

Computes and combines disparities in both L-R and R-L directions, and combine them.

For better occlusion handling, discarding invalid disparity values

void setSubpixel(bool enable)

Computes disparity with sub-pixel interpolation (5 fractional bits).

Suitable for long range. Currently incompatible with extended disparity

void setExtendedDisparity(bool enable)

Disparity range increased from 0-95 to 0-190, combined from full resolution and downscaled images.

Suitable for short range objects. Currently incompatible with sub-pixel disparity

void setRectifyEdgeFillColor(int color)

Fill color for missing data at frame edges

Parameters
  • color: Grayscale 0..255, or -1 to replicate pixels

void setRectifyMirrorFrame(bool enable)

DEPRECATED function. It was removed, since rectified images are not flipped anymore. Mirror rectified frames, only when LR-check mode is disabled. Default true. The mirroring is required to have a normal non-mirrored disparity/depth output.

A side effect of this option is disparity alignment to the perspective of left or right input: false: mapped to left and mirrored, true: mapped to right. With LR-check enabled, this option is ignored, none of the outputs are mirrored, and disparity is mapped to right.

Parameters
  • enable: True for normal disparity/depth, otherwise mirrored

void setOutputRectified(bool enable)

Enable outputting rectified frames. Optimizes computation on device side when disabled. DEPRECATED. The outputs are auto-enabled if used

void setOutputDepth(bool enable)

Enable outputting ‘depth’ stream (converted from disparity). In certain configurations, this will disable ‘disparity’ stream. DEPRECATED. The output is auto-enabled if used

void setRuntimeModeSwitch(bool enable)

Enable runtime stereo mode switch, e.g. from standard to LR-check. Note: when enabled resources allocated for worst case to enable switching to any mode.

void setNumFramesPool(int numFramesPool)

Specify number of frames in pool.

Parameters
  • numFramesPool: How many frames should the pool have

float getMaxDisparity() const

Useful for normalization of the disparity map.

Return

Maximum disparity value that the node can return

Public Members

StereoDepthConfig initialConfig

Initial config to use for StereoDepth.

Input inputConfig = {*this, "inputConfig", Input::Type::SReceiver, false, 4, {{DatatypeEnum::StereoDepthConfig, false}}}

Input StereoDepthConfig message with ability to modify parameters in runtime. Default queue is non-blocking with size 4.

Input left = {*this, "left", Input::Type::SReceiver, false, 8, {{DatatypeEnum::ImgFrame, true}}}

Input for left ImgFrame of left-right pair

Default queue is non-blocking with size 8

Input right = {*this, "right", Input::Type::SReceiver, false, 8, {{DatatypeEnum::ImgFrame, true}}}

Input for right ImgFrame of left-right pair

Default queue is non-blocking with size 8

Output depth = {*this, "depth", Output::Type::MSender, {{DatatypeEnum::ImgFrame, false}}}

Outputs ImgFrame message that carries RAW16 encoded (0..65535) depth data in millimeters.

Non-determined / invalid depth values are set to 0

Output disparity = {*this, "disparity", Output::Type::MSender, {{DatatypeEnum::ImgFrame, false}}}

Outputs ImgFrame message that carries RAW8 / RAW16 encoded disparity data: RAW8 encoded (0..95) for standard mode; RAW8 encoded (0..190) for extended disparity mode; RAW16 encoded (0..3040) for subpixel disparity mode (32 subpixel levels on top of standard mode).

Output syncedLeft = {*this, "syncedLeft", Output::Type::MSender, {{DatatypeEnum::ImgFrame, false}}}

Passthrough ImgFrame message from ‘left’ Input.

Output syncedRight = {*this, "syncedRight", Output::Type::MSender, {{DatatypeEnum::ImgFrame, false}}}

Passthrough ImgFrame message from ‘right’ Input.

Output rectifiedLeft = {*this, "rectifiedLeft", Output::Type::MSender, {{DatatypeEnum::ImgFrame, false}}}

Outputs ImgFrame message that carries RAW8 encoded (grayscale) rectified frame data.

Output rectifiedRight = {*this, "rectifiedRight", Output::Type::MSender, {{DatatypeEnum::ImgFrame, false}}}

Outputs ImgFrame message that carries RAW8 encoded (grayscale) rectified frame data.

Output outConfig = {*this, "outConfig", Output::Type::MSender, {{DatatypeEnum::StereoDepthConfig, false}}}

Outputs StereoDepthConfig message that contains current stereo configuration.

Output debugDispLrCheckIt1 = {*this, "debugDispLrCheckIt1", Output::Type::MSender, {{DatatypeEnum::ImgFrame, false}}}

Outputs ImgFrame message that carries left-right check first iteration (before combining with second iteration) disparity map. Useful for debugging/fine tuning.

Output debugDispLrCheckIt2 = {*this, "debugDispLrCheckIt2", Output::Type::MSender, {{DatatypeEnum::ImgFrame, false}}}

Outputs ImgFrame message that carries left-right check second iteration (before combining with first iteration) disparity map. Useful for debugging/fine tuning.

Output debugExtDispLrCheckIt1 = {*this, "debugExtDispLrCheckIt1", Output::Type::MSender, {{DatatypeEnum::ImgFrame, false}}}

Outputs ImgFrame message that carries extended left-right check first iteration (downscaled frame, before combining with second iteration) disparity map. Useful for debugging/fine tuning.

Output debugExtDispLrCheckIt2 = {*this, "debugExtDispLrCheckIt2", Output::Type::MSender, {{DatatypeEnum::ImgFrame, false}}}

Outputs ImgFrame message that carries extended left-right check second iteration (downscaled frame, before combining with first iteration) disparity map. Useful for debugging/fine tuning.

Output debugDispCostDump = {*this, "debugDispCostDump", Output::Type::MSender, {{DatatypeEnum::ImgFrame, false}}}

Outputs ImgFrame message that carries cost dump of disparity map. Useful for debugging/fine tuning.

Output confidenceMap = {*this, "confidenceMap", Output::Type::MSender, {{DatatypeEnum::ImgFrame, false}}}

Outputs ImgFrame message that carries RAW8 confidence map. Lower values means higher confidence of the calculated disparity value. RGB aligment, left-right check or any postproccessing (e.g. median filter) is not performed on confidence map.

Private Functions

nlohmann::json getProperties() override
std::shared_ptr<Node> clone() override

Private Members

Properties properties
std::shared_ptr<RawStereoDepthConfig> rawConfig

Disparity

Disparity refers to the distance between two corresponding points in the left and right image of a stereo pair. By looking at the image below, it can be seen that point X gets projected to XL = (u, v) in the Left view and XR = (p, q) in the Right view.

../../../_images/disparity_explanation.jpeg

Since we know points XL and XR refer to the same point: X, the disparity for this point is equal to the magnitude of the vector between (u, v) and (p, q).

For a more detailed explanation see this answer on Stack Overflow.

When calculating the disparity, each pixel in the disparity map gets assigned a confidence value 0..255 by the stereo matching algorithm, as:

  • 0 - maximum confidence that it holds a valid value

  • 255 - minimum confidence, so there is more chance that the value is incorrect

(this confidence score is kind-of inverted, if say comparing with NN)

For the final disparity map, a filtering is applied based on the confidence threshold value: the pixels that have their confidence score larger than the threshold get invalidated, i.e. their disparity value is set to zero. You can set the confidence threshold with stereo.initialConfig.setConfidenceThreshold().

Calculate depth using disparity map

Disparity and depth are inversely related. As disparity decreases, depth increases exponentially depending on baseline and focal length. Meaning, if the disparity value is close to zero, then a small change in disparity generates a large change in depth. Similarly, if the disparity value is big, then large changes in disparity do not lead to a large change in depth.

By considering this fact, depth can be calculated using this formula:

depth = focal_length_in_pixels * baseline / disparity_in_pixels

where baseline is the distance between two mono cameras. Note the unit used for baseline and depth is the same.

To get focal length in pixels, use this formula:

focal_length_in_pixels = image_width_in_pixels * 0.5 / tan(HFOV * 0.5 * PI/180)

# With 400P mono camera resolution where HFOV=71.9 degrees
focal_length_in_pixels = 640 * 0.5 / tan(71.9 * 0.5 * PI / 180) = 441.25

# With 800P mono camera resolution where HFOV=71.9 degrees
focal_length_in_pixels = 1280 * 0.5 / tan(71.9 * 0.5 * PI / 180) = 882.5

Examples for calculating the depth value, using the OAK-D (7.5cm baseline):

# For OAK-D @ 400P mono cameras and disparity of eg. 50 pixels
depth = 441.25 * 7.5 / 50 = 66.19 # cm

# For OAK-D @ 800P mono cameras and disparity of eg. 10 pixels
depth = 882.5 * 7.5 / 10 = 661.88 # cm

Note the value of disparity depth data is stored in uint16, where 0 is a special value, meaning that distance is unknown.

Min stereo depth distance

If the depth results for close-in objects look weird, this is likely because they are below the minimum depth-perception distance of the device.

To calculate this minimum distance, use the depth formula and choose the maximum value for disparity_in_pixels parameter (keep in mind it is inveresly related, so maximum value will yield the smallest result).

For example OAK-D has a baseline of 7.5cm, focal_length_in_pixels of 882.5 pixels and the default maximum value for disparity_in_pixels is 95. By using the depth formula we get:

min_distance = focal_length_in_pixels * baseline / disparity_in_pixels = 882.5 * 7.5cm / 95 = 69.67cm

or roughly 70cm.

However this distance can be cut in 1/2 (to around 35cm for the OAK-D) with the following options:

  1. Changing the resolution to 640x400, instead of the standard 1280x800.

  2. Enabling Extended Disparity.

Extended Disparity mode increases the levels of disparity to 191 from the standard 96 pixels, thereby 1/2-ing the minimum depth. It does so by computing the 96-pixel disparities on the original 1280x720 and on the downscaled 640x360 image, which are then merged to a 191-level disparity. For more information see the Extended Disparity tab in this table.

Using the previous OAK-D example, disparity_in_pixels now becomes 190 and the minimum distance is:

min_distance = focal_length_in_pixels * baseline / disparity_in_pixels = 882.5 * 7.5cm / 190 = 34.84cm

or roughly 35cm.

Note

Applying both of those options is possible, which would set the minimum depth to 1/4 of the standard settings, but at such short distances the minimum depth is limited by focal length, which is 19.6cm, since OAK-D mono cameras have fixed focus distance: 19.6cm - infinity.

See these examples for how to enable Extended Disparity.

Max stereo depth distance

The maximum depth perception distance depends on the accuracy of the depth perception. The formula used to calculate this distance is an approximation, but is as follows:

Dm = (baseline/2) * tan((90 - HFOV / HPixels)*pi/180)

So using this formula for existing models the theoretical max distance is:

# For OAK-D (7.5cm baseline)
Dm = (7.5/2) * tan((90 - 71.9/1280)*pi/180) = 3825.03cm = 38.25 meters

# For OAK-D-CM4 (9cm baseline)
Dm = (9/2) * tan((90 - 71.9/1280)*pi/180) = 4590.04cm = 45.9 meters

If greater precision for long range measurements is required, consider enabling Subpixel Disparity or using a larger baseline distance between mono cameras. For a custom baseline, you could consider using OAK-FFC device or design your own baseboard PCB with required baseline. For more information see Subpixel Disparity under the Stereo Mode tab in this table.

Depth perception accuracy

Disparity depth works by matching features from one image to the other and its accuracy is based on multiple parameters:

  • Texture of objects / backgrounds

Backgrounds may interfere with the object detection, since backgrounds are objects too, which will make depth perception less accurate. So disparity depth works very well outdoors as there are very rarely perfectly-clean/blank surfaces there - but these are relatively commonplace indoors (in clean buildings at least).

  • Lighting

If the illumination is low, the diparity map will be of low confidence, which will result in a noisy depth map.

  • Baseline / distance to objects

Lower baseline enables us to detect the depth at a closer distance as long as the object is visible in both the frames. However, this reduces the accuracy for large distances due to less pixels representing the object and disparity decreasing towards 0 much faster. So the common norm is to adjust the baseline according to how far/close we want to be able to detect objects.

Limitation

Since depth is calculated from disparity, which requires the pixels to overlap, there is inherently a vertical band on the left side of the left mono camera and on the right side of the right mono camera, where depth cannot be calculated, since it is seen by only 1 camera. That band is marked with B on the following picture.

https://user-images.githubusercontent.com/59799831/135310921-67726c28-07e7-4ffa-bc8d-74861049517e.png

Meaning of variables on the picture:

  • BL [cm] - Baseline of stereo cameras.

  • Dv [cm] - Minimum distace where both cameras see an object (thus where depth can be calculated).

  • B [pixels] - Width of the band where depth cannot be calculated.

  • W [pixels] - Width of mono in pixels camera or amount of horizontal pixels, also noted as HPixels in other formulas.

  • D [cm] - Distance from the cameras to an object.

  • F [cm] - Width of image at the distance D.

https://user-images.githubusercontent.com/59799831/135310972-c37ba40b-20ad-4967-92a7-c71078bcef99.png

With the use of the tan function, the following formulas can be obtained:

  • F = 2 * D * tan(HFOV/2)

  • Dv = (BL/2) * tan(90 - HFOV/2)

In order to obtain B, we can use tan function again (same as for F), but this time we must also multiply it by the ratio between W and F in order to convert units to pixels. That gives the following formula:

B = 2 * Dv * tan(HFOV/2) * W / F
B = 2 * Dv * tan(HFOV/2) * W / (2 * D * tan(HFOV/2))
B = W * Dv / D  # pixels

Example: If we are using OAK-D, which has a HFOV of 72°, a baseline (BL) of 7.5 cm and 640x400 (400P) resolution is used, therefore W = 640 and an object is D = 100 cm away, we can calculate B in the following way:

Dv = 7.5 / 2 * tan(90 - 72/2) = 3.75 * tan(54°) = 5.16 cm
B = 640 * 5.16 / 100 = 33 # pixels

Credit for calculations and images goes to our community member gregflurry, which he made on this forum post.

Note

OAK-D-PRO will include both IR dot projector and IR LED, which will enable operation in no light. IR LED is used to illuminate the whole area (for mono/color frames), while IR dot projector is mostly for accurate disparity matching - to have good quality depth maps on blank surfaces as well. For outdoors, the IR laser dot projector is only relevant at night. For more information see the development progress here.

Got questions?

We’re always happy to help with code or other questions you might have.