AI / ML / NN¶
OAK cameras can run any AI model, even custom architectured/built ones. You can even run multiple AI models at the same time, either in parallel or series (a demo here).
To run a custom AI model on the device, you need to convert it to the .blob format - documentation here.
You can also choose to use one of 250+ pretrained AI models from either OpenVINO Model Zoo or DepthAI Model Zoo, read more at Use a Pre-trained OpenVINO model.
Model Performance¶
You can estimate the performance of your model with the help of the chart below. It contains FPS estimations of models on OAK devices in dependence of FLOPs and parameters.

You can find more detailed evaluation of FPS for commond models in this sheet.
AI vision tasks¶
We have open-source examples and demos for many different AI vision tasks, such as:
Object detection models provide bounding box, confidence, and label of all detected objects. Demos: MobileNet, Yolo, EfficientDet, Palm detection.
Landmark detection models provide landmarks/keypoints of an object. Demos: Human pose, hand landmarks, and facial landmarks.
Semantic segmentation models provide label/class for each pixel. Demos: Person segmentation, multiclass segmentation, road segmentation.
Classification models provide classification label and confidence in that label. Demos: EfficientNet, Tensorflow classification, fire classification, emotions classification.
Recognition models provide byte array that can be used for recognition or recognized feature itself. Demos: Face recognition, person identification, OCR, license plate recognition.
There are also many other AI vision tasks that don’t fall in any of the categories above, like crowd counting, monocular depth estimation, gaze estimation, or age/gender estimation.
All of the demos above run on color/grayscale frames. Many of these vision tasks can be fused with the depth perception (on the OAK camera itself), which unlocks the power of Spatial AI.
Got questions?
We’re always happy to help with code or other questions you might have.