LuxonisDataset
LuxonisDataset
Overview
LuxonisDataset class offers a simple API for creating and managing data in the Luxonis Data Format (LDF). It acts as an abstraction layer and provides methods for dataset:- initialization,
- ingestion,
- splitting,
- merging, and
- export, cloud synchronization, and deletion.
Dataset initialization
LuxonisDataset object:Python
1from luxonis_ml.data.datasets import LuxonisDataset
2
3dataset_name: str = ... # e.g. "parking_lot"
4dataset = LuxonisDataset(dataset_name)Datasets can be stored locally or using one of the supported cloud storage providers, including GCS, S3, and Azure Blob storage. By default, the initialized dataset is stored locally.
If there already exist a dataset with the provided
dataset_name, it will be automatically loaded instead of initializing a new one. Therefore, beware to use a unique name for each new dataset or pass delete_local=True to the LuxonisDataset constructor to overwrite an existing one.team_id, bucket_type, bucket_storage, and delete_remote parameters. These are useful when the dataset should live in shared or remote object storage instead of only on the local machine.Adding Data
Python
1{
2 "file": str, # path to the image file
3 "annotation": Optional[dict] # single image annotation
4}annotation field depends on the task type. The following task types are supported:Below we provide an example generator function for the parking lot dataset, yielding the data instances for bounding box annotations.Python
1import json
2from pathlib import Path
3
4# path to the dataset, replace it with the actual path on your system
5dataset_root = Path("data/parking_lot")
6
7def generator():
8 for annotation_dir in dataset_root.iterdir():
9 with open(annotation_dir / "annotations.json") as f:
10 data = json.load(f)
11
12 # get the width and height of the image
13 W = data["dimensions"]["width"]
14 H = data["dimensions"]["height"]
15
16 image_path = annotation_dir / data["filename"]
17
18 for instance_id, bbox in data["BoundingBoxAnnotation"].items():
19
20 # get unnormalized bounding box coordinates
21 x, y = bbox["origin"]
22 w, h = bbox["dimension"]
23
24 # get the class name of the bounding box
25 class_ = bbox["labelName"]
26 yield {
27 "file": image_path,
28 "annotation": {
29 "class": class_,
30 # normalized bounding box
31 "boundingbox": {
32 "x": x / W,
33 "y": y / H,
34 "w": w / W,
35 "h": h / H,
36 },
37 },
38 }add method of the dataset.Python
1dataset.add(generator())The
add method accepts any iterable, not only generators.Metadata and sources
- task definitions and class mappings
- keypoint skeletons
- categorical metadata encodings
- dataset source structure
LuxonisSource and LuxonisComponent, which lets one dataset describe multi-component or multi-sensor inputs as well. For example, one source can contain multiple image components instead of only a single RGB image.Useful metadata-related methods include:set_tasks(...)to define task groups explicitlyset_classes(...)to register class mappingsget_source_names()to inspect the available dataset sourcesupdate_source(...)to update source/component metadata
Defining Splits
train, val, and test sets. The splits are defined by calling the make_splits method on the LuxonisDataset object and passing the desired split ratios in its arguments (by default, the data are split with the 80:10:10 ratio between train, val, and test sets).Python
1dataset.make_splits({
2 "train": 0.7,
3 "val": 0.2,
4 "test": 0.1,
5})Python
1dataset.make_splits({
2 "train": ["file1.jpg", "file2.jpg", ...],
3 "val": ["file3.jpg", "file4.jpg", ...],
4 "test": ["file5.jpg", "file6.jpg", ...],
5})make_splits method again will raise an error. If you wish to redefine them, pass redefine_splits=True to the method call.Cloud sync and dataset discovery
LuxonisDataset can also:- list datasets with
LuxonisDataset.list_datasets(...) - pull missing or all media locally with
pull_from_cloud(...) - push local data to remote object storage with
push_to_cloud(...)
Dataset Cloning
clone method on the LuxonisDataset object and passing the desired name of the new dataset.Python
1dataset_clone = dataset.clone(new_dataset_name="dataset_clone")Dataset Merging
merge_with method on the first LuxonisDataset object and passing the second one as an argument. You can choose between two different merging modes:inplace: the first dataset is modified to include data from the second datasetout-of-place: a new dataset is created from the combination of two existing datasets
Python
1# inplace merging
2dataset1.merge_with(dataset2, inplace=True)
3# OR out-of-place merging
4dataset_merge = dataset1.merge_with(dataset2, inplace=False, new_dataset_name="dataset_merge")Dataset export
LuxonisDataset can export data back out of LDF into common dataset formats. This is useful when you want to prepare data in LuxonisML, but train or inspect it in another toolchain.Python
1from luxonis_ml.enums import DatasetType
2
3dataset.export("exports/coco", dataset_type=DatasetType.COCO)CLI Reference
luxonis_ml CLI provides a set of various useful commands for managing datasets. These commands are accessible via the luxonis_ml data command.The available commands are:luxonis_ml data ls- lists all datasetsluxonis_ml data info <dataset_name>- prints information about the datasetluxonis_ml data inspect <dataset_name>- renders the data in the dataset on screen usingcv2luxonis_ml data delete <dataset_name>- deletes the dataset
luxonis_ml data --help or pass the --help flag to any of the above commands.