LuxonisDataset
Overview
TheLuxonisDataset
class offers a simple API for creating and managing data in the Luxonis Data Format (LDF). It acts as an abstraction layer and provides methods for dataset:- initialization,
- ingestion,
- splitting,
- merging, and
- deletion.
Dataset initialization
Dataset creation process starts by initializing theLuxonisDataset
object:Python
1from luxonisml.data import LuxonisDataset
2
3dataset_name: str = ... # e.g. "parking_lot"
4dataset = LuxonisDataset(dataset_name)
Datasets can be stored locally or using one of the supported cloud storage providers (e.g. GCS or S3). By default, the initialized dataset is stored locally.
If there already exist a dataset with the provided
dataset_name
, it will be automatically loaded instead of initializing a new one. Therefore, beware to use a unique name for each new dataset or pass delete_local=True
to the LuxonisDataset
constructor to overwrite an existing one.Adding Data
After dataset initialization, we can start with data ingestion. We must first define a generator function that yields individual data instances. Each data instance stores path to an image and a single annotation (e.g. a bounding box). So in case of multiple annotations per image, multiple data instances must be and yielded separately.We define data instances as a Python dictionary with the following structure:Python
1{
2 "file": str, # path to the image file
3 "annotation": Optional[dict] # single image annotation
4}
annotation
field depends on the task type. The following task types are supported:Below we provide an examplary generator function for the parking lot dataset, yielding the data instances for bounding box annotations.Python
1import json
2from pathlib import Path
3
4# path to the dataset, replace it with the actual path on your system
5dataset_root = Path("data/parking_lot")
6
7def generator():
8 for annotation_dir in dataset_root.iterdir():
9 with open(annotation_dir / "annotations.json") as f:
10 data = json.load(f)
11
12 # get the width and height of the image
13 W = data["dimensions"]["width"]
14 H = data["dimensions"]["height"]
15
16 image_path = annotation_dir / data["filename"]
17
18 for instance_id, bbox in data["BoundingBoxAnnotation"].items():
19
20 # get unnormalized bounding box coordinates
21 x, y = bbox["origin"]
22 w, h = bbox["dimension"]
23
24 # get the class name of the bounding box
25 class_ = bbox["labelName"]
26 yield {
27 "file": image_path,
28 "annotation": {
29 "class": class_,
30 # normalized bounding box
31 "boundingbox": {
32 "x": x / W,
33 "y": y / H,
34 "w": w / W,
35 "h": h / H,
36 },
37 },
38 }
add
method of the dataset.Python
1dataset.add(generator())
The
add
method accepts any iterable, not only generators.Defining Splits
After adding data to the dataset, we can define its splits. There are no restrictions on the split names but in most cases one should stick totrain
, val
, and test
sets. The splits are defined by calling the make_splits
method on the LuxonisDataset
object and passing the desired split ratios in its arguments (by default, the data are split with the 80:10:10 ratio between train
, val
, and test
sets).Python
1dataset.make_splits({
2 "train": 0.7,
3 "val": 0.2,
4 "test": 0.1,
5})
Python
1dataset.make_splits({
2 "train": ["file1.jpg", "file2.jpg", ...],
3 "val": ["file3.jpg", "file4.jpg", ...],
4 "test": ["file5.jpg", "file6.jpg", ...],
5})
make_splits
method again will raise an error. If you wish to redefine them, pass redefine_splits=True
to the method call.Dataset Cloning
You can clone an existing dataset to create a copy with a new name. This is useful for testing changes without affecting the original dataset. Cloning is done by calling theclone
method on the LuxonisDataset
object and passing the desired name of the new dataset.Python
1dataset_clone = dataset.clone(new_dataset_name="dataset_clone")
Dataset Merging
Datasets can also be merged together. This is beneficial for combining multiple datasets into a larger, unified dataset for comprehensive training or analysis. Merging is done by calling themerge_with
method on the first LuxonisDataset
object and passing the second one as an argument. You can choose between two different merging modes:inplace
: the first dataset is modified to include data from the second datasetout-of-place
: a new dataset is created from the combination of two existing datasets
Python
1# inplace merging
2dataset1.merge_with(dataset2, inplace=True)
3# OR out-of-place merging
4dataset_merge = dataset1.merge_with(dataset2, inplace=False, new_dataset_name="dataset_merge")
CLI Reference
Theluxonis_ml
CLI provides a set of various useful commands for managing datasets. These commands are accessible via the luxonis_ml data
command.The available commands are:luxonis_ml data ls
- lists all datasetsluxonis_ml data info <dataset_name>
- prints information about the datasetluxonis_ml data inspect <dataset_name>
- renders the data in the dataset on screen usingcv2
luxonis_ml data delete <dataset_name>
- deletes the dataset
luxonis_ml data --help
or pass the --help
flag to any of the above commands.