LuxonisDataset

Overview

The LuxonisDataset class offers a simple API for creating and managing data in the Luxonis Data Format (LDF). It acts as an abstraction layer and provides methods for dataset:

initialization,
ingestion,
splitting,
merging, and
deletion.

The following sections will guide you through creation of a LDF dataset. We have prepared a simple toy dataset you can use to follow along the examples (ParkingLot.zip). It consists of images of cars and motorcycles on a parking lot, each annotated with a bounding box, keypoints and segmentation mask.

Dataset initialization

Dataset creation process starts by initializing the LuxonisDataset object:

Python

1from luxonisml.data import LuxonisDataset
2
3dataset_name: str = ... # e.g. "parking_lot"
4dataset = LuxonisDataset(dataset_name)

Datasets can be stored locally or using one of the supported cloud storage providers (e.g. GCS or S3). By default, the initialized dataset is stored locally.

If there already exist a dataset with the provided dataset_name, it will be automatically loaded instead of initializing a new one. Therefore, beware to use a unique name for each new dataset or pass delete_local=True to the LuxonisDataset constructor to overwrite an existing one.

Adding Data

After dataset initialization, we can start with data ingestion. We must first define a generator function that yields individual data instances. Each data instance stores path to an image and a single annotation (e.g. a bounding box). So in case of multiple annotations per image, multiple data instances must be and yielded separately.We define data instances as a Python dictionary with the following structure:

Python

1{
2    "file": str,  # path to the image file
3    "annotation": Optional[dict]  # single image annotation
4}

where content of the annotation field depends on the task type. The following task types are supported:

Below we provide an examplary generator function for the parking lot dataset, yielding the data instances for bounding box annotations.

Python

1import json
2from pathlib import Path
3
4# path to the dataset, replace it with the actual path on your system
5dataset_root = Path("data/parking_lot")
6
7def generator():
8    for annotation_dir in dataset_root.iterdir():
9        with open(annotation_dir / "annotations.json") as f:
10            data = json.load(f)
11
12        # get the width and height of the image
13        W = data["dimensions"]["width"]
14        H = data["dimensions"]["height"]
15
16        image_path = annotation_dir / data["filename"]
17
18        for instance_id, bbox in data["BoundingBoxAnnotation"].items():
19
20            # get unnormalized bounding box coordinates
21            x, y = bbox["origin"]
22            w, h = bbox["dimension"]
23
24            # get the class name of the bounding box
25            class_ = bbox["labelName"]
26            yield {
27                "file": image_path,
28                "annotation": {
29                    "class": class_,
30                    # normalized bounding box
31                    "boundingbox": {
32                        "x": x / W,
33                        "y": y / H,
34                        "w": w / W,
35                        "h": h / H,
36                    },
37                },
38            }

The generator is then passed to the add method of the dataset.

Python

1dataset.add(generator())

The add method accepts any iterable, not only generators.

Defining Splits

After adding data to the dataset, we can define its splits. There are no restrictions on the split names but in most cases one should stick to train, val, and test sets. The splits are defined by calling the make_splits method on the LuxonisDataset object and passing the desired split ratios in its arguments (by default, the data are split with the 80:10:10 ratio between train, val, and test sets).

Python

1dataset.make_splits({
2  "train": 0.7,
3  "val": 0.2,
4  "test": 0.1,
5})

For a more refined control over the splits, you can pass a dictionary with the split names as keys and lists of file names as values:

Python

1dataset.make_splits({
2  "train": ["file1.jpg", "file2.jpg", ...],
3  "val": ["file3.jpg", "file4.jpg", ...],
4  "test": ["file5.jpg", "file6.jpg", ...],
5})

Once splits are made, calling the make_splits method again will raise an error. If you wish to redefine them, pass redefine_splits=True to the method call.

Dataset Cloning

You can clone an existing dataset to create a copy with a new name. This is useful for testing changes without affecting the original dataset. Cloning is done by calling the clone method on the LuxonisDataset object and passing the desired name of the new dataset.

Python

1dataset_clone = dataset.clone(new_dataset_name="dataset_clone")

Dataset Merging

Datasets can also be merged together. This is beneficial for combining multiple datasets into a larger, unified dataset for comprehensive training or analysis. Merging is done by calling the merge_with method on the first LuxonisDataset object and passing the second one as an argument. You can choose between two different merging modes:

inplace: the first dataset is modified to include data from the second dataset
out-of-place: a new dataset is created from the combination of two existing datasets

Python

1# inplace merging
2dataset1.merge_with(dataset2, inplace=True)
3# OR out-of-place merging
4dataset_merge = dataset1.merge_with(dataset2, inplace=False, new_dataset_name="dataset_merge")

CLI Reference

The luxonis_ml CLI provides a set of various useful commands for managing datasets. These commands are accessible via the luxonis_ml data command.The available commands are:

luxonis_ml data ls - lists all datasets
luxonis_ml data info <dataset_name> - prints information about the dataset
luxonis_ml data inspect <dataset_name> - renders the data in the dataset on screen using cv2
luxonis_ml data delete <dataset_name> - deletes the dataset

For more information, run luxonis_ml data --help or pass the --help flag to any of the above commands.