# LuxonisParser

## Overview

The LuxonisParser offers a simple API for creating datasets from several common dataset formats. This includes popular
Roboflow-exported layouts, Ultralytics-style datasets, Luxonis native LDF datasets, and a few specialized formats such as SOLO:

> Note: When parsing ZIP files, do not include a top-level
> `dataset_dir`
> folder in the archive. The train, validation, and test directories (according to the selected format) should be placed directly
> **at the root**
> of the ZIP archive.

#### - COCO

We support COCO JSON format in two variants:

 * [FiftyOne layout](https://docs.voxel51.com/user_guide/export_datasets.html#cocodetectiondataset-export)

```plaintext
dataset_dir/
    ├── train/
    │   ├── data/
    │   │   ├── img1.jpg
    │   │   ├── img2.jpg
    │   │   └── ...
    │   └── labels.json
    ├── validation/
    │   ├── data/
    │   └── labels.json
    └── test/
        ├── data/
        └── labels.json
```

 * [Roboflow](https://roboflow.com/formats/coco-json)

```plaintext
dataset_dir/
    ├── train/
    │   ├── img1.jpg
    │   ├── img2.jpg
    │   └── ...
    │   └── _annotations.coco.json
    ├── valid/
    └── test/
```

#### - [YOLOv8-v12](https://roboflow.com/formats/yolov8-pytorch-txt) and [Ultralytics](https://docs.ultralytics.com/datasets/)

 * Roboflow format (supports YOLOv8-v12)

```plaintext
dataset_dir/
    ├── train/
    │   ├── images/
    │   │   ├── img1.jpg
    │   │   ├── img2.jpg
    │   │   └── ...
    │   ├── labels/
    │   │   ├── img1.txt
    │   │   ├── img2.txt
    │   │   └── ...
    ├── valid/
    ├── test/
    └── *.yaml
```

 * Ultralytics format

```plaintext
dataset_dir/
    ├── images/
    │   ├── train/
    │   │   ├── img1.jpg
    │   │   ├── img2.jpg
    │   │   └── ...
    │   ├── val/
    │   └── test/
    ├── labels/
    │   ├── train/
    │   │   ├── img1.txt
    │   │   ├── img2.txt
    │   │   └── ...
    │   ├── val/
    │   └── test/
    └── *.yaml
```

#### - [Pascal VOC XML](https://roboflow.com/formats/pascal-voc-xml)

```plaintext
dataset_dir/
    ├── train/
    │   ├── img1.jpg
    │   ├── img1.xml
    │   └── ...
    ├── valid/
    └── test/
```

#### - [YOLO Darknet TXT](https://roboflow.com/formats/yolo-darknet-txt)

```plaintext
dataset_dir/
    ├── train/
    │   ├── img1.jpg
    │   ├── img1.txt
    │   ├── ...
    │   └── _darknet.labels
    ├── valid/
    └── test/
```

#### - [YOLOv4 PyTorch TXT](https://roboflow.com/formats/yolov4-pytorch-txt)

```plaintext
dataset_dir/
    ├── train/
    │   ├── img1.jpg
    │   ├── img2.jpg
    │   ├── ...
    │   ├── _annotations.txt
    │   └── _classes.txt
    ├── valid/
    └── test/
```

#### - [MT YOLOv6](https://roboflow.com/formats/mt-yolov6)

```plaintext
dataset_dir/
    ├── images/
    │   ├── train/
    │   │   ├── img1.jpg
    │   │   ├── img2.jpg
    │   │   └── ...
    │   ├── valid/
    │   └── test/
    ├── labels/
    │   ├── train/
    │   │   ├── img1.txt
    │   │   ├── img2.txt
    │   │   └── ...
    │   ├── valid/
    │   └── test/
    └── data.yaml
```

#### - [CreateML JSON](https://roboflow.com/formats/createml-json)

```plaintext
dataset_dir/
    ├── train/
    │   ├── img1.jpg
    │   ├── img2.jpg
    │   └── ...
    │   └── _annotations.createml.json
    ├── valid/
    └── test/
```

#### - [TensorFlow Object Detection CSV](https://roboflow.com/formats/tensorflow-object-detection-csv)

```plaintext
dataset_dir/
    ├── train/
    │   ├── img1.jpg
    │   ├── img2.jpg
    │   ├── ...
    │   └── _annotations.csv
    ├── valid/
    └── test/
```

#### - [SOLO](https://docs.unity3d.com/Packages/com.unity.perception@1.0/manual/Schema/SoloSchema.html)

```plaintext
dataset_dir/
    ├── train/
    │   ├── metadata.json
    │   ├── sensor_definitions.json
    │   ├── annotation_definitions.json
    │   ├── metric_definitions.json
    │   └── sequence.<SequenceNUM>/
    │       ├── step<StepNUM>.camera.jpg
    │       ├── step<StepNUM>.frame_data.json
    │       └── (OPTIONAL: step<StepNUM>.camera.semantic segmentation.jpg)
    ├── valid/
    └── test/
```

#### - Classification Directory

A directory with subdirectories for each class. Two structures are supported:

 * Split structure with train/valid/test subdirectories:

```plaintext
dataset_dir/
    ├── train/
    │   ├── class1/
    │   │   ├── img1.jpg
    │   │   ├── img2.jpg
    │   │   └── ...
    │   ├── class2/
    │   └── ...
    ├── valid/
    └── test/
```

 * Flat structure (class subdirectories directly in root, random splits applied at parse time):

```plaintext
dataset_dir/
    ├── class1/
    │   ├── img1.jpg
    │   └── ...
    ├── class2/
    │   └── ...
    └── info.json  (optional metadata file)
```

#### - [FiftyOne Classification](https://docs.voxel51.com/user_guide/export_datasets.html#fiftyone-image-classification-dataset)

FiftyOneImageClassificationDataset format with images in a data/ folder and labels in labels.json. Two structures are supported:

 * Split structure with train/validation/test subdirectories:

```plaintext
dataset_dir/
    ├── train/
    │   ├── data/
    │   │   ├── img1.jpg
    │   │   └── ...
    │   └── labels.json
    ├── validation/
    │   ├── data/
    │   └── labels.json
    └── test/
        ├── data/
        └── labels.json
```

 * Flat structure (random splits applied at parse time):

```plaintext
dataset_dir/
    ├── data/
    │   ├── img1.jpg
    │   └── ...
    └── labels.json
```

The labels.json format:

```json
{
    "classes": ["class1", "class2", ...],
    "labels": {
        "image_stem": class_index,
        ...
    }
}
```

#### - Native LDF

Native Luxonis dataset export with split-level annotations.json files.

```plaintext
dataset_dir/
    ├── train/
    │   └── annotations.json
    ├── valid/
    └── test/
```

#### - Segmentation Mask Directory

A directory with images and corresponding masks.

```plaintext
dataset_dir/
    ├── train/
    │   ├── img1.jpg
    │   ├── img1_mask.png
    │   ├── ...
    │   └── _classes.csv
    ├── valid/
    └── test/
```

The masks are stored as grayscale PNG images where each pixel value corresponds to a class. The mapping from pixel values to class
is defined in the _classes.csv file.

```csv
Pixel Value, Class
0, background
1, class1
2, class2
3, class3
```

## Dataset Parsing

Parsing starts by initializing the LuxonisParser object with the path to dataset directory. Optionally, you can specify the name,
task name, and the type (i.e. the format) of the dataset (by default, the name is set to the name of the provided dataset
directory, and the type is inferred based on dataset directory structure). The dataset directory can either be a path to a local
directory or a remote dataset identifier. The parser currently accepts local paths, .zip archives, gcs://..., s3://..., and
roboflow://workspace/project/version/format dataset identifiers. You can also provide the dataset directory as a .zip file.

```python
from luxonis_ml.data.parsers import LuxonisParser
from luxonis_ml.enums import DatasetType

dataset_dir = "roboflow://workspace/project/version/coco"

parser = LuxonisParser(
    dataset_dir=dataset_dir,
    dataset_name="my_dataset",
    dataset_type=DatasetType.COCO,
    task_name="detection",
)
```

After initializing the LuxonisParser object, parsing can be run by calling the .parse() method on it:

```python
dataset = parser.parse()
```

This creates a LuxonisDataset instance containing the data from the provided dataset, keeping the original splits whenever the
source format defines them. If the dataset already exists in Luxonis format, parsing is skipped and the existing dataset is
returned.

## CLI Reference

The parsing functionality can be invoked by using the luxonis_ml data parse command.

```bash
luxonis_ml data parse path/to/dataset --name my_dataset --type coco
```

For more detailed information, run luxonis_ml data parse --help.
