LuxonisParser

Overview

The LuxonisParser offers a simple API for creating datasets from several common dataset formats. This includes popular Roboflow-exported layouts, Ultralytics-style datasets, Luxonis native LDF datasets, and a few specialized formats such as SOLO:

- COCO

We support COCO JSON format in two variants:
Plain Text
1dataset_dir/
2    ├── train/
3    │   ├── data/
4    │   │   ├── img1.jpg
5    │   │   ├── img2.jpg
6    │   │   └── ...
7    │   └── labels.json
8    ├── validation/
9    │   ├── data/
10    │   └── labels.json
11    └── test/
12        ├── data/
13        └── labels.json
Plain Text
1dataset_dir/
2    ├── train/
3    │   ├── img1.jpg
4    │   ├── img2.jpg
5    │   └── ...
6    │   └── _annotations.coco.json
7    ├── valid/
8    └── test/
  • Roboflow format (supports YOLOv8-v12)
Plain Text
1dataset_dir/
2    ├── train/
3    │   ├── images/
4    │   │   ├── img1.jpg
5    │   │   ├── img2.jpg
6    │   │   └── ...
7    │   ├── labels/
8    │   │   ├── img1.txt
9    │   │   ├── img2.txt
10    │   │   └── ...
11    ├── valid/
12    ├── test/
13    └── *.yaml
  • Ultralytics format
Plain Text
1dataset_dir/
2    ├── images/
3    │   ├── train/
4    │   │   ├── img1.jpg
5    │   │   ├── img2.jpg
6    │   │   └── ...
7    │   ├── val/
8    │   └── test/
9    ├── labels/
10    │   ├── train/
11    │   │   ├── img1.txt
12    │   │   ├── img2.txt
13    │   │   └── ...
14    │   ├── val/
15    │   └── test/
16    └── *.yaml
Plain Text
1dataset_dir/
2    ├── train/
3    │   ├── img1.jpg
4    │   ├── img1.xml
5    │   └── ...
6    ├── valid/
7    └── test/
Plain Text
1dataset_dir/
2    ├── train/
3    │   ├── img1.jpg
4    │   ├── img1.txt
5    │   ├── ...
6    │   └── _darknet.labels
7    ├── valid/
8    └── test/
Plain Text
1dataset_dir/
2    ├── train/
3    │   ├── img1.jpg
4    │   ├── img2.jpg
5    │   ├── ...
6    │   ├── _annotations.txt
7    │   └── _classes.txt
8    ├── valid/
9    └── test/
Plain Text
1dataset_dir/
2    ├── images/
3    │   ├── train/
4    │   │   ├── img1.jpg
5    │   │   ├── img2.jpg
6    │   │   └── ...
7    │   ├── valid/
8    │   └── test/
9    ├── labels/
10    │   ├── train/
11    │   │   ├── img1.txt
12    │   │   ├── img2.txt
13    │   │   └── ...
14    │   ├── valid/
15    │   └── test/
16    └── data.yaml
Plain Text
1dataset_dir/
2    ├── train/
3    │   ├── img1.jpg
4    │   ├── img2.jpg
5    │   └── ...
6    │   └── _annotations.createml.json
7    ├── valid/
8    └── test/
Plain Text
1dataset_dir/
2    ├── train/
3    │   ├── img1.jpg
4    │   ├── img2.jpg
5    │   ├── ...
6    │   └── _annotations.csv
7    ├── valid/
8    └── test/
Plain Text
1dataset_dir/
2    ├── train/
3    │   ├── metadata.json
4    │   ├── sensor_definitions.json
5    │   ├── annotation_definitions.json
6    │   ├── metric_definitions.json
7    │   └── sequence.<SequenceNUM>/
8    │       ├── step<StepNUM>.camera.jpg
9    │       ├── step<StepNUM>.frame_data.json
10    │       └── (OPTIONAL: step<StepNUM>.camera.semantic segmentation.jpg)
11    ├── valid/
12    └── test/

- Classification Directory

A directory with subdirectories for each class. Two structures are supported:
  • Split structure with train/valid/test subdirectories:
Plain Text
1dataset_dir/
2    ├── train/
3    │   ├── class1/
4    │   │   ├── img1.jpg
5    │   │   ├── img2.jpg
6    │   │   └── ...
7    │   ├── class2/
8    │   └── ...
9    ├── valid/
10    └── test/
  • Flat structure (class subdirectories directly in root, random splits applied at parse time):
Plain Text
1dataset_dir/
2    ├── class1/
3    │   ├── img1.jpg
4    │   └── ...
5    ├── class2/
6    │   └── ...
7    └── info.json  (optional metadata file)
FiftyOneImageClassificationDataset format with images in a data/ folder and labels in labels.json. Two structures are supported:
  • Split structure with train/validation/test subdirectories:
Plain Text
1dataset_dir/
2    ├── train/
3    │   ├── data/
4    │   │   ├── img1.jpg
5    │   │   └── ...
6    │   └── labels.json
7    ├── validation/
8    │   ├── data/
9    │   └── labels.json
10    └── test/
11        ├── data/
12        └── labels.json
  • Flat structure (random splits applied at parse time):
Plain Text
1dataset_dir/
2    ├── data/
3    │   ├── img1.jpg
4    │   └── ...
5    └── labels.json
The labels.json format:
JSON
1{
2    "classes": ["class1", "class2", ...],
3    "labels": {
4        "image_stem": class_index,
5        ...
6    }
7}

- Native LDF

Native Luxonis dataset export with split-level annotations.json files.
Plain Text
1dataset_dir/
2    ├── train/
3    │   └── annotations.json
4    ├── valid/
5    └── test/

- Segmentation Mask Directory

A directory with images and corresponding masks.
Plain Text
1dataset_dir/
2    ├── train/
3    │   ├── img1.jpg
4    │   ├── img1_mask.png
5    │   ├── ...
6    │   └── _classes.csv
7    ├── valid/
8    └── test/
The masks are stored as grayscale PNG images where each pixel value corresponds to a class. The mapping from pixel values to class is defined in the _classes.csv file.
Csv
1Pixel Value, Class
20, background
31, class1
42, class2
53, class3

Dataset Parsing

Parsing starts by initializing the LuxonisParser object with the path to dataset directory. Optionally, you can specify the name, task name, and the type (i.e. the format) of the dataset (by default, the name is set to the name of the provided dataset directory, and the type is inferred based on dataset directory structure). The dataset directory can either be a path to a local directory or a remote dataset identifier. The parser currently accepts local paths, .zip archives, gcs://..., s3://..., and roboflow://workspace/project/version/format dataset identifiers. You can also provide the dataset directory as a .zip file.
Python
1from luxonis_ml.data.parsers import LuxonisParser
2from luxonis_ml.enums import DatasetType
3
4dataset_dir = "roboflow://workspace/project/version/coco"
5
6parser = LuxonisParser(
7    dataset_dir=dataset_dir,
8    dataset_name="my_dataset",
9    dataset_type=DatasetType.COCO,
10    task_name="detection",
11)
After initializing the LuxonisParser object, parsing can be run by calling the .parse() method on it:
Python
1dataset = parser.parse()
This creates a LuxonisDataset instance containing the data from the provided dataset, keeping the original splits whenever the source format defines them. If the dataset already exists in Luxonis format, parsing is skipped and the existing dataset is returned.

CLI Reference

The parsing functionality can be invoked by using the luxonis_ml data parse command.
Command Line
1luxonis_ml data parse path/to/dataset --name my_dataset --type coco
For more detailed information, run luxonis_ml data parse --help.