# Data Preparation

## Overview

LuxonisTrain supports multiple methods for loading data, providing flexibility to use different dataset formats. Here are the
supported options:

 1. Data Directory: Use a data directory formatted in one of the supported structures. For more details, refer to the
    [LuxonisParser](https://docs.luxonis.com/software-v3/ai-inference/model-source/training/luxonis-ml/luxonis-parser.md).
 2. LuxonisDataset Format: Utilize an existing dataset in the custom LuxonisDataset format. For further guidance, see the
    [LuxonisDataset](https://docs.luxonis.com/software-v3/ai-inference/model-source/training/luxonis-ml/luxonis-dataset.md)
    documentation.
 3. Custom Loader: Implement a custom loader to meet specific data handling needs. To learn how to create and use custom loaders,
    visit the [Customizations](https://docs.luxonis.com/software-v3/ai-inference/model-source/training/luxonis-train/concepts.md)
    section.

## Data Directory

### Preparing Your Data

To use the [LuxonisParser](https://docs.luxonis.com/software-v3/ai-inference/model-source/training/luxonis-ml/luxonis-parser.md)
tool, you need to prepare your dataset in one of the supported source structures.

 1. Organize your dataset in one of the supported formats.
 2. Place your dataset in a directory accessible to the training script.
 3. Update the dataset_dir parameter in your configuration file to point to the dataset directory.

The dataset_dir can be one of the following:

 * Local path to the dataset directory.
 * URL to a remote dataset: The dataset will be downloaded to a "data" directory in the current working directory.

Supported URL protocols:

 * s3://bucket/path/to/directory for AWS S3
 * gs://bucket/path/to/directory for Google Cloud Storage
 * roboflow://workspace/project/version/format for Roboflow
   * workspace: Name of the workspace the dataset belongs to.
   * project: Name of the project the dataset belongs to.
   * version: Version of the dataset.
   * format: One of coco, darknet, voc, yolov4pytorch, mt-yolov6, createml, tensorflow, folder, or png-mask-semantic.

Example:

```yaml
loader:
  params:
    dataset_name: "coco_test"
    dataset_dir: "roboflow://team-roboflow/coco-128/2/coco"
```

## LuxonisDataset

To use the [LuxonisDataset](https://docs.luxonis.com/software-v3/ai-inference/model-source/training/luxonis-ml/luxonis-dataset.md)
as a source of the data, specify the following in the config file:

```yaml
loader:
  params:
    # Name of the dataset
    dataset_name: "dataset_name"

    # Storage type: one of 'local' (default), 's3', or 'gcs'
    bucket_storage: "local"
```

To inspect the loader output you can use the inspect command. The inspect command shows the images and their corresponding
annotations in the dataset.

```bash
luxonis_train inspect --config configs/detection_light_model.yaml
```

inspect is currently available only in the CLI.
