# YoloSpatialDetectionNetwork

Spatial detection for the Yolo NN. It is similar to a combination of the
[YoloDetectionNetwork](https://docs.luxonis.com/software/depthai-components/nodes/yolo_detection_network.md) and
[SpatialLocationCalculator](https://docs.luxonis.com/software/depthai-components/nodes/spatial_location_calculator.md).

## How to place it

#### Python

```python
pipeline = dai.Pipeline()
yoloSpatial = pipeline.create(dai.node.YoloSpatialDetectionNetwork)
```

#### C++

```cpp
dai::Pipeline pipeline;
auto yoloSpatial = pipeline.create<dai::node::YoloSpatialDetectionNetwork>();
```

## Inputs and Outputs

## Configuring Spatial Detection

The pipeline of the SpatialDetectionNetwork node is described in the schema below:

Spatial Detection node is essentially just an abstraction of the Detection Network
([YoloDetectionNetwork](https://docs.luxonis.com/software/depthai-components/nodes/yolo_detection_network.md) and
[MobileNetDetectionNetwork](https://docs.luxonis.com/software/depthai-components/nodes/mobilenet_detection_network.md)) and the
[SpatialLocationCalculator](https://docs.luxonis.com/software/depthai-components/nodes/spatial_location_calculator.md).

It works by linking the bounding boxes of each detected object to the spatial location calculator. The process goes as follows:

### Detection

The Detection Network is responsible for detecting objects in the input frame. It outputs a list of detected objects, each
represented by a bounding box, label and a confidence score.

### Alignment

The depth map is aligned with the input frame. This is necessary because the DetectionNetwork operates on the input frame, while
the SpatialLocationCalculator operates on the depth map.

### Scaling of BBOX

The bounding box from the network is sent to SpatialLocationCalculator and is scaled according to BoundingBoxScaleFactor. This is
done to ensure it includes the entire object. The bounding box is then used along with depth to calculate the spatial coordinates
of the object.

### Calculation of spatials

 * X and Y coordinates are taken from the bounding box center. They are calculated based of the offset from the center of the
   frame and the depth at that point.
 * For depth (Z), each pixel inside the scaled bounding box (ROI) is taken into account. This gives us a set of depth values,
   which are then averaged to get the final depth value.

### Averaging methods

 * Average/mean: the average of ROI is used for calculation.
 * Min: the minimum value inside ROI is used for calculation.
 * Max: the maximum value inside ROI is used for calculation.
 * Mode: the most frequent value inside ROI is used for calculation.
 * Median: the median value inside ROI is used for calculation.

Default method is Median.

## Common mistakes

Most mistakes stem from incorrect bounding box overlap. The scaled bounding box may include parts of the background, which can
skew the depth calculation.

 * Thin objects (like a pole) may will have inaccurate spatials since only a small portion of the bounding box actually lies on
   the detected object. In such cases, it is best to use a smaller BoundingBoxScaleFactor if possible.
 * Objects with holes - hoops, rings, etc. To get the correct depth, the bounding box should include the entire object. Instead of
   median depth, use MIN depth method to exclude the background from calculation. Alternatively a depth threshold can be set to
   ignore the background in static environment.

## Usage

#### Python

```python
pipeline = dai.Pipeline()
yoloSpatial = pipeline.create(dai.node.YoloSpatialDetectionNetwork)
yoloSpatial.setBlobPath(nnBlobPath)

# Spatial detection specific parameters
yoloSpatial.setConfidenceThreshold(0.5)
yoloSpatial.input.setBlocking(False)
yoloSpatial.setBoundingBoxScaleFactor(0.5)
yoloSpatial.setDepthLowerThreshold(100) # Min 10 centimeters
yoloSpatial.setDepthUpperThreshold(5000) # Max 5 meters

# Yolo specific parameters
yoloSpatial.setNumClasses(80)
yoloSpatial.setCoordinateSize(4)
yoloSpatial.setAnchors([10,14, 23,27, 37,58, 81,82, 135,169, 344,319])
yoloSpatial.setAnchorMasks({ "side26": [1,2,3], "side13": [3,4,5] })
yoloSpatial.setIouThreshold(0.5)
```

#### C++

```cpp
dai::Pipeline pipeline;
auto yoloSpatial = pipeline.create<dai::node::YoloSpatialDetectionNetwork>();
yoloSpatial->setBlobPath(nnBlobPath);

// Spatial detection specific parameters
yoloSpatial->setConfidenceThreshold(0.5f);
yoloSpatial->input.setBlocking(false);
yoloSpatial->setBoundingBoxScaleFactor(0.5);
yoloSpatial->setDepthLowerThreshold(100); // Min 10 centimeters
yoloSpatial->setDepthUpperThreshold(5000); // Max 5 meters

// yolo specific parameters
yoloSpatial->setNumClasses(80);
yoloSpatial->setCoordinateSize(4);
yoloSpatial->setAnchors({10, 14, 23, 27, 37, 58, 81, 82, 135, 169, 344, 319});
yoloSpatial->setAnchorMasks({{"side13", {3, 4, 5}}, {"side26", {1, 2, 3}}});
yoloSpatial->setIouThreshold(0.5f);
```

## Examples of functionality

 * [RGB & TinyYolo with spatial
   data](https://docs.luxonis.com/software/depthai-components/nodes/yolo_spatial_detection_network.md)

## Spatial coordinate system

OAK camera uses left-handed (Cartesian) coordinate system for all spatial coordinates.

Middle of the frame is 0,0 in terms of X,Y coordinates. If you go up, Y will increase, and if you go right, X will increase.

## Reference

### depthai.node.YoloSpatialDetectionNetwork(depthai.node.SpatialDetectionNetwork)

Kind: Class

YoloSpatialDetectionNetwork node. Yolo-based network with spatial location data.

#### getAnchorMasks(self) -> dict[str, list[int]]: dict[str, list[int]]

Kind: Method

Get anchor masks

#### getAnchors(self) -> list[float]: list[float]

Kind: Method

Get anchors

#### getCoordinateSize(self) -> int: int

Kind: Method

Get coordianate size

#### getIouThreshold(self) -> float: float

Kind: Method

Get Iou threshold

#### getNumClasses(self) -> int: int

Kind: Method

Get num classes

#### setAnchorMasks(self, anchorMasks: collections.abc.Mapping [ str , collections.abc.Sequence [ typing.SupportsInt ] ])

Kind: Method

Set anchor masks

#### setAnchors(self, anchors: collections.abc.Sequence [ typing.SupportsFloat ])

Kind: Method

Set anchors

#### setCoordinateSize(self, coordinates: typing.SupportsInt)

Kind: Method

Set coordianate size

#### setIouThreshold(self, thresh: typing.SupportsFloat)

Kind: Method

Set Iou threshold

#### setNumClasses(self, numClasses: typing.SupportsInt)

Kind: Method

Set num classes

### Need assistance?

Head over to [Discussion Forum](https://discuss.luxonis.com/) for technical support or any other questions you might have.