# SpatialDetectionNetwork

Spatial detection node is similar to a combination of the
[DetectionNetwork](https://docs.luxonis.com/software-v3/depthai/depthai-components/nodes/detection_network.md) and
[SpatialLocationCalculator](https://docs.luxonis.com/software-v3/depthai/depthai-components/nodes/spatial_location_calculator.md).

## How to place it

#### Python

```python
modelDescription = dai.NNModelDescription("yolov6-nano")
pipeline = dai.Pipeline()
spatialDetectionNetwork = pipeline.create(dai.node.SpatialDetectionNetwork).build(camRgb, stereo, modelDescription)
```

#### C++

```cpp
dai::NNModelDescription modelDescription("yolov6-nano");
dai::Pipeline pipeline;
auto spatialDetectionNetwork = pipeline.create<dai::node::SpatialDetectionNetwork>()->build(camRgb, stereo, modelDescription);
```

## Inputs and Outputs

## Configuring Spatial Detection

The pipeline of the SpatialDetectionNetwork node is described in the schema below:

Spatial Detection node is essentially just an abstraction of the Detection Network
([DetectionNetwork](https://docs.luxonis.com/software-v3/depthai/depthai-components/nodes/detection_network.md) and the
[SpatialLocationCalculator](https://docs.luxonis.com/software-v3/depthai/depthai-components/nodes/spatial_location_calculator.md).

It works by linking the bounding boxes of each detected object to the spatial location calculator. The process goes as follows:

### Detection

The Detection Network is responsible for detecting objects in the input frame. It outputs a list of detected objects, each
represented by a bounding box, label and a confidence score.

### Alignment

The depth map is aligned with the input frame. This is necessary because the DetectionNetwork operates on the input frame, while
the SpatialLocationCalculator operates on the depth map.

### Scaling of BBOX

The bounding box from the network is sent to SpatialLocationCalculator and is scaled according to BoundingBoxScaleFactor. This is
done to ensure it includes the entire object. The bounding box is then used along with depth to calculate the spatial coordinates
of the object.

### Calculation of spatials

 * X and Y coordinates are taken from the bounding box center. They are calculated based of the offset from the center of the
   frame and the depth at that point.
 * For depth (Z), each pixel inside the scaled bounding box (ROI) is taken into account. This gives us a set of depth values,
   which are then averaged to get the final depth value.

### Averaging methods

 * Average/mean: the average of ROI is used for calculation.
 * Min: the minimum value inside ROI is used for calculation.
 * Max: the maximum value inside ROI is used for calculation.
 * Mode: the most frequent value inside ROI is used for calculation.
 * Median: the median value inside ROI is used for calculation.

Default method is Median.

## Common mistakes

Most mistakes stem from incorrect bounding box overlap. The scaled bounding box may include parts of the background, which can
skew the depth calculation.

 * Thin objects (like a pole) may will have inaccurate spatials since only a small portion of the bounding box actually lies on
   the detected object. In such cases, it is best to use a smaller BoundingBoxScaleFactor if possible.
 * Objects with holes - hoops, rings, etc. To get the correct depth, the bounding box should include the entire object. Instead of
   median depth, use MIN depth method to exclude the background from calculation. Alternatively a depth threshold can be set to
   ignore the background in static environment.

## Usage

#### Python

```python
with dai.Pipeline() as p:
    camRgb = p.create(dai.node.Camera).build(dai.CameraBoardSocket.CAM_A)
    monoLeft = p.create(dai.node.Camera).build(dai.CameraBoardSocket.CAM_B)
    monoRight = p.create(dai.node.Camera).build(dai.CameraBoardSocket.CAM_C)
    stereo = p.create(dai.node.StereoDepth)
    spatialDetectionNetwork = p.create(dai.node.SpatialDetectionNetwork).build(camRgb, stereo, modelDescription, fps=FPS)

    spatialDetectionNetwork.input.setBlocking(False)
    spatialDetectionNetwork.setBoundingBoxScaleFactor(0.5)
    spatialDetectionNetwork.setDepthLowerThreshold(100)
    spatialDetectionNetwork.setDepthUpperThreshold(5000)

    labelMap = spatialDetectionNetwork.getClasses()
```

#### C++

```cpp
dai::Pipeline pipeline;
auto camRgb = pipeline.create<dai::node::Camera>()->build(dai::CameraBoardSocket::CAM_A);
auto monoLeft = pipeline.create<dai::node::Camera>()->build(dai::CameraBoardSocket::CAM_B);
auto monoRight = pipeline.create<dai::node::Camera>()->build(dai::CameraBoardSocket::CAM_C);
auto stereo = pipeline.create<dai::node::StereoDepth>();
auto spatialDetectionNetwork = pipeline.create<dai::node::SpatialDetectionNetwork>()->build(camRgb, stereo, modelDescription, FPS);
spatialDetectionNetwork->input.setBlocking(false);
spatialDetectionNetwork->setBoundingBoxScaleFactor(0.5);
spatialDetectionNetwork->setDepthLowerThreshold(100);
spatialDetectionNetwork->setDepthUpperThreshold(5000);
auto labelMap = spatialDetectionNetwork->getClasses();
```

## Examples of functionality

 * [Spatial Detection
   Network](https://docs.luxonis.com/software-v3/depthai/examples/spatial_detection_network/spatial_detection.md)

## Spatial coordinate system

OAK camera uses the RDF (Right-Down-Forward) coordinate system for all spatial coordinates.

Middle of the frame is 0,0 in terms of X,Y coordinates. If you go down, Y will increase, and if you go right, X will increase.

## Reference

### dai::node::SpatialDetectionNetwork

Kind: class

SpatialDetectionNetwork node. Runs a neural inference on input image and calculates spatial location data.

#### SpatialDetectionNetworkProperties Properties

Kind: enum

#### NeuralNetwork::Model Model

Kind: enum

#### Properties & properties

Kind: variable

#### Subnode < NeuralNetwork > neuralNetwork

Kind: variable

#### Subnode < DetectionParser > detectionParser

Kind: variable

#### Subnode < SpatialLocationCalculator > spatialLocationCalculator

Kind: variable

#### std::unique_ptr< Subnode < ImageAlign > > depthAlign

Kind: variable

#### Input & input

Kind: variable

Input message with data to be inferred upon Default queue is blocking with size 5

#### Output & outNetwork

Kind: variable

Outputs unparsed inference results.

#### Output & passthrough

Kind: variable

Passthrough message on which the inference was performed. Suitable for when input queue is set to non-blocking behavior.

#### Input & inputDepth

Kind: variable

Input message with depth data used to retrieve spatial information about detected object Default queue is non-blocking with size 4

#### Output & out

Kind: variable

Outputs ImgDetections message that carries parsed detection results.

#### Output & passthroughDepth

Kind: variable

Passthrough message for depth frame on which the spatial location calculation was performed. Suitable for when input queue is set
to non-blocking behavior.

#### SpatialDetectionNetwork(const std::shared_ptr< Device > & device)

Kind: function

#### SpatialDetectionNetwork(std::unique_ptr< Properties > props)

Kind: function

#### SpatialDetectionNetwork(std::unique_ptr< Properties > props, bool confMode)

Kind: function

#### SpatialDetectionNetwork(const std::shared_ptr< Device > & device, std::unique_ptr< Properties > props, bool confMode)

Kind: function

#### std::shared_ptr< SpatialDetectionNetwork > build(const std::shared_ptr< Camera > & inputRgb, const DepthSource & depthSource,
const Model & model, std::optional< float > fps, std::optional< dai::ImgResizeMode > resizeMode)

Kind: function

Build SpatialDetectionNetwork node with specified depth source. Connect Camera and depth source outputs to this node's inputs and
configure the inference model.

parameters: inputRgb: Camera node; depthSource: Depth source node ( StereoDepth , NeuralDepth , or ToF ); model: Neural network
model description, NNArchive or HubAI model id string; fps: Desired frames per second; resizeMode: Resize mode for input color
frames return: Shared pointer to SpatialDetectionNetwork node

#### std::shared_ptr< SpatialDetectionNetwork > build(const std::shared_ptr< Camera > & inputRgb, const DepthSource & depthSource,
const Model & model, const ImgFrameCapability & capability)

Kind: function

Build SpatialDetectionNetwork node with specified depth source. Connect Camera and depth source outputs to this node's inputs and
configure the inference model.

parameters: inputRgb: Camera node; depthSource: Depth source node ( StereoDepth , NeuralDepth , or ToF ); model: Neural network
model description, NNArchive or HubAI model id string; capability: Camera capabilities return: Shared pointer to
SpatialDetectionNetwork node

#### void setNNArchive(const NNArchive & nnArchive)

Kind: function

Set NNArchive for this Node . If the archive's type is SUPERBLOB, use default number of shaves.

parameters: nnArchive: NNArchive to set

#### void setFromModelZoo(NNModelDescription description, bool useCached)

Kind: function

Download model from zoo and set it for this Node .

parameters: description: Model description to download; useCached: Use cached model if available

#### void setNNArchive(const NNArchive & nnArchive, int numShaves)

Kind: function

Set NNArchive for this Node , throws if the archive's type is not SUPERBLOB.

parameters: nnArchive: NNArchive to set; numShaves: Number of shaves to use

#### void setBlobPath(const std::filesystem::path & path)

Kind: function

Backwards compatibility interface Load network blob into assets and use once pipeline is started. parameters: Error: if file
doesn't exist or isn't a valid network blob. parameters: path: Path to network blob

#### void setBlob(OpenVINO::Blob blob)

Kind: function

Load network blob into assets and use once pipeline is started. parameters: blob: Network blob

#### void setBlob(const std::filesystem::path & path)

Kind: function

Same functionality as the setBlobPath() . Load network blob into assets and use once pipeline is started. parameters: Error: if
file doesn't exist or isn't a valid network blob. parameters: path: Path to network blob

#### void setModelPath(const std::filesystem::path & modelPath)

Kind: function

Load network file into assets. parameters: modelPath: Path to the model file.

#### void setNumPoolFrames(int numFrames)

Kind: function

Specifies how many frames will be available in the pool parameters: numFrames: How many frames will pool have

#### void setNumInferenceThreads(int numThreads)

Kind: function

How many threads should the node use to run the network. parameters: numThreads: Number of threads to dedicate to this node

#### void setNumNCEPerInferenceThread(int numNCEPerThread)

Kind: function

How many Neural Compute Engines should a single thread use for inference parameters: numNCEPerThread: Number of NCE per thread

#### void setNumShavesPerInferenceThread(int numShavesPerThread)

Kind: function

How many Shaves should a single thread use for inference parameters: numShavesPerThread: Number of shaves per thread

#### void setBackend(std::string backend)

Kind: function

Specifies backend to use parameters: backend: String specifying backend to use

#### void setBackendProperties(std::map< std::string, std::string > properties)

Kind: function

Set backend properties parameters: backendProperties: backend properties map

#### int getNumInferenceThreads()

Kind: function

How many inference threads will be used to run the network return: Number of threads, 0, 1 or 2. Zero means AUTO

#### void setConfidenceThreshold(float thresh)

Kind: function

Specifies confidence threshold at which to filter the rest of the detections. parameters: thresh: Detection confidence must be
greater than specified threshold to be added to the list

#### float getConfidenceThreshold()

Kind: function

Retrieves threshold at which to filter the rest of the detections. return: Detection confidence

#### void setBoundingBoxScaleFactor(float scaleFactor)

Kind: function

Custom interface Specifies scale factor for detected bounding boxes. parameters: scaleFactor: Scale factor must be in the interval
(0,1].

#### void setDepthLowerThreshold(uint32_t lowerThreshold)

Kind: function

Specifies lower threshold in depth units (millimeter by default) for depth values which will used to calculate spatial data
parameters: lowerThreshold: LowerThreshold must be in the interval [0,upperThreshold] and less than upperThreshold.

#### void setDepthUpperThreshold(uint32_t upperThreshold)

Kind: function

Specifies upper threshold in depth units (millimeter by default) for depth values which will used to calculate spatial data
parameters: upperThreshold: UpperThreshold must be in the interval (lowerThreshold,65535].

#### void setSpatialCalculationAlgorithm(dai::SpatialLocationCalculatorAlgorithm calculationAlgorithm)

Kind: function

Specifies spatial location calculator algorithm: Average/Min/Max parameters: calculationAlgorithm: Calculation algorithm.

#### void setSpatialCalculationStepSize(int stepSize)

Kind: function

Specifies spatial location calculator step size for depth calculation. Step size 1 means that every pixel is taken into
calculation, size 2 means every second etc. parameters: stepSize: Step size.

#### std::optional< std::vector< std::string > > getClasses()

Kind: function

Get classes labels.

#### void buildInternal()

Kind: function

Function called from within the

### Need assistance?

Head over to [Discussion Forum](https://discuss.luxonis.com/) for technical support or any other questions you might have.