Robotics Vision Core 2 (RVC2)

Robotics Vision Core 2 (RVC2 in short) is the second generation of our RVC. Series 2 OAK device and our initial devices are built on top of the RVC2.RVC2 encapsulates two main components:

DepthAI features that are fine-tuned for the particular SoC
A performant SoC and all it's support circuitry (HS PCB layout, power delivery network, efficient heat dissipation, etc.)

RVC2 Performance

4 TOPS of processing power (1.4 TOPS for AI) - RVC2 NN Performance

Run any AI model, even custom-architectured/built ones - models need to be converted.

Encoding: H.264, H.265, MJPEG - 4K/30FPS, 1080P/60FPS

Computer vision: warp/dewarp, resize, crop via ImageManip node, edge detection, feature tracking. You can also run custom CV functions

Object tracking: 2D and 3D tracking with ObjectTracker node

On-device programming: Run custom logic/tasks on-device (guide)

RVC2 NN Performance

Click here for full table with 81 test results.

Model name	Size	FPS	Latency [ms]
ResNet-50	224x224	26.5	56.5
MobileOne S0	224x224	165.5	11.1
Resnet18	224x225	94.8	19.7
DeepLab V3	256 x 256	36.5	48.1
DeepLab V3	513 x 513	6.3	253.1
YOLOv6n R2	416x416	65.5	29.3
YOLOv6n R2	640x640	29.3	66.4
YOLOv6t R2	416x416	35.8	54.1
YOLOv6t R2	640x640	14.2	133.6
YOLOv6m R2	416x416	8.6	190.2
YOLOv7t	416x416	46.7	37.6
YOLOv7t	640x640	17.8	97.0
YOLOv8n	416x416	31.3	56.9
YOLOv8n	640x640	14.3	123.6
YOLOv8s	416x416	15.2	111.9
YOLOv8m	416x416	6.0	273.8
YOLOv9t	416x416	21.70	46.09
YOLOv9t	640x640	10.69	93.60
YOLOv9s	416x416	12.74	78.49
YOLOv9m	416x416	4.71	212.31
YOLOv10n	416x416	27.07	36.95
YOLOv10n	640x640	12.62	79.21
YOLOv10s	416x416	14.03	71.29
YOLOv10m	416x416	6.05	165.26
YOLO11n	416x416	28.08	35.61
YOLO11n	640x640	12.80	78.11
YOLO11s	416x416	12.17	82.14
YOLO11m	416x416	3.90	256.20

Models were compiled for 8 shaves and were using 2 NN inference threads. Latency includes getting results from device over USB3.5 iterations were run for each model and FPS was calculated as an average.

NN Performance estimation

You can estimate the performance of a model with the help of the chart below. It contains FPS estimations of models on RVC2 based on FLOPs and parameters.

Click on the image to view a more detailed evaluation of FPS for common models.

Power consumption

The RVC2 itself has a maximum power consumption of about 4.5W, which is mainly consumed by the SoC, Movidius Myriad X, that is integrated inside the RVC2.

Hardware blocks and accelerators

The SoC has integrated a number of hardware accelerators, and DepthAI API has been designed to optimally utilize them:

2xLeon CPU cores:
- Leon CSS handles: USB/ethernet stack (managed by XLink framework), IMU, 3A algorithms. One way to reduce CSS CPU consumption would be to reduce the 3A rate by currently reducing camera FPS. We are also working on skipping 3A for some frames (eg. to only run 3A every 3rd frame). CSS CPU consumption is higher on POE models as it's running the ethernet stack.
- Leon MSS handles everything else; scheduling HW accelerated features, using shaves, etc.
ISP - Image Signal Processor, used for image processing, such as denoising, sharpening, etc. The whole ISP configuration is exposed through API via ColorCamera node and MonoCamera node.
2x NCEs (Neural Compute Engines) were architected for a slew of operations/layers, but there are some that aren't implemented, which are implemented on SHAVE cores.
16x SHAVE cores - vector processors. Used for executing some NN operations/layers, they are versatile and can be used for other tasks as well, like CV algorithms (reformatting images, doing some ISP, etc.).
- For higher resolutions more SHAVES are consumed; for 1080P, 3 SHAVES are used, and for 4K, 6 SHAVES are used.
- Internal resource manager inside DepthAI coordinates the use of SHAVES, and warns if too many resources are requested by a given pipeline configuration.
20x CMX slices - these are fast SRAM memory blocks (each 128kB) that are used for temporary storage of calculations. They are used by NN models, camera ISP (3 CMX slices for 1080P or below), image manipulations processes etc. Note that 4 CMX slices are pre-allocated, so there are only 16 free ones.
Stereo pipeline - Stereo matching (census transform, cost matching and cost aggregation) used by StereoDepth node.
Video encoder which supports MJPEG, H264 and H265 codecs. It's used by VideoEncoder node.
Vision blocks:
- Edge detection - used by EdgeDetector node.
- 3x Warp engine - used by ImageManip node / Warp node, used for warping, stereo rectification, undistrotion, etc.
- Corner detection (Harris/Shi-Thomasi) - used by FeatureTracker node.
- Motion estimator - used by FeatureTracker node.
- Min/Max calculator - used by FeatureTracker node for NMS (for Harris corner detection).

You can check the SHAVE and CMX by enabling debug information.