ON THIS PAGE

  • Notation
  • CameraSocket coordinate systems
  • Socket to camera mapping
  • Origin (reference) camera
  • Camera-to-camera transforms
  • Housing coordinate systems
  • Transforming to a housing coordinate system
  • ImageTransformation — working with data

Coordinate Systems

Supported on:RVC2RVC4
All coordinate systems defined in the device are right-handed. Each device has two kinds:
  • CameraSocket coordinate systems — defined by the camera pinhole model during calibration.
  • Housing coordinate systems — defined from the mechanical shape of the device.

Notation

To transform a point from coordinate system A to B, we use:
where:
is a column vector in homogeneous coordinates of point p in coordinate system A, and:
is the 4×4 transformation matrix from coordinate system A to B.

CameraSocket coordinate systems

Each device has n camera sensors (typically 3). Each sensor has its own coordinate system defined by the pinhole camera model. During factory calibration all n camera sensors are calibrated with respect to each other, forming a chain of transformations. If calibration degrades over time due to thermal or mechanical stress, the device can self-heal using Dynamic Calibration.

Socket to camera mapping

On a typical OAK device, the camera sockets map as follows:
Socket NumberSocket NameCamera Name
0CAM_ARGB
1CAM_BLEFT
2CAM_CRIGHT
Camera transformations typically form a linear chain:

Origin (reference) camera

The camera with the lowest socket index in the calibration chain is called the origin (also referred to as ref). All extrinsics in DepthAI are expressed relative to this origin camera. For example, on a typical device with CAM_A, CAM_B, and CAM_C, the origin is CAM_A.Each image produced by DepthAI also carries its own ImgTransformation, which includes the camera extrinsics — the transformation from the virtual camera to the origin (typically CAM_A as defined above).A virtual camera is the camera model that describes the image as it currently exists — after any processing (warp, crop, undistort, etc.) has been applied. When DepthAI transforms an image, the result is equivalent to a photo taken by a different, virtual camera that may be positioned and oriented slightly differently in space. Because of this, the virtual camera's extrinsics can differ from the physical sensor's extrinsics that were established during calibration. ImgTransformation always reflects the virtual camera so that downstream nodes work with up-to-date geometry.

Camera-to-camera transforms

You can retrieve any camera-to-camera transform using:
Python
1# srcCamera / dstCamera — any dai.CameraBoardSocket value:
2#   CAM_A, CAM_B, CAM_C, CAM_D, CAM_E, CAM_F, CAM_G, CAM_H, CAM_I, CAM_J
3
4# Get 4x4 extrinsic transformation matrix from CAM_B to CAM_A
5calibration.getCameraExtrinsics(dai.CameraBoardSocket.CAM_B, dai.CameraBoardSocket.CAM_A)
This returns the transformation matrix from srcCamera to dstCamera:
Internally, DepthAI computes this by chaining two origin transforms:
Python
1calibration.getExtrinsicsToOrigin(srcCamera)  # returns T^src_origin
2calibration.getExtrinsicsToOrigin(dstCamera)  # returns T^dst_origin
The final camera-to-camera transform is then composed as:

Housing coordinate systems

Housing coordinate systems are coordinate frames tied to physical features of the device enclosure. There are three kinds:
  • VESA_AVESA_J — Device mounting points.
  • FRONT_CAM_AFRONT_CAM_J — Positioned at the front glass, aligned with the front glass plane.
  • CAM_ACAM_J — Positioned at camera sensor spec locations.
All housing coordinate systems are oriented RDF (Right-Down-Forward), with the X-Y plane mechanically defined by the front glass. This means housing coordinate systems compensate for imperfections in the physical position of the sensors inside the device.Nodes can integrate housing coordinate systems directly — you select a target coordinate system and the output data is automatically transformed into it. For example, the PointCloud node lets you receive point cloud data directly in any housing or camera socket coordinate system.

Transforming to a housing coordinate system

You can retrieve the transform from a camera to any housing coordinate system using:
Python
1# housingCS — any dai.HousingCoordinateSystem value:
2#   CAM_A … CAM_J           — camera housing origin
3#   FRONT_CAM_A … FRONT_CAM_J — front-cover coordinate system per camera
4#   VESA_A … VESA_J         — VESA mount coordinate system per camera
5#   IMU                     — IMU housing origin
6
7# Get 4x4 transform from CAM_A into the VESA_A housing coordinate system
8calibration.getHousingCalibration(dai.CameraBoardSocket.CAM_A, dai.HousingCoordinateSystem.VESA_A)
This produces the transformation matrix from srcCamera to the selected housing coordinate system:
Internally, this involves three steps:

Compute the housing-to-origin chain

Python
1calibration.getHousingToHousingOrigin()       # returns T^housing_housingOrigin
2calibration.getExtrinsicsToOrigin(housingOrigin)  # returns T^housingOrigin_origin
3calibration.getExtrinsicsToOrigin(srcCamera)      # returns T^srcCamera_origin

Retrieve the housing-to-specific-housing transform

The housing-to-specific-housing transform is retrieved from the DepthAI Boards JSON file:

Compose the final transform

ImageTransformation — working with data

When working with data, always use ImgTransformation as the source of transformations. Every data frame carries its own extrinsics — the transformation from the data frame to the origin (ref) camera. This extrinsic is kept up to date automatically when:
  • An operation is performed on the data (crop, resize, undistort, etc.)
  • The calibration changes at runtime (e.g. due to the AutoCalibration node)
Nodes like PointCloud and ImageAlign read transformations from ImgTransformation automatically, so they stay calibration-aware without manual intervention.

Need assistance?

Head over to Discussion Forum for technical support or any other questions you might have.