OverviewΒΆ

StandardE2E is a unified framework for preprocessing, indexing, and loading autonomous driving datasets. It enables training models on multiple heterogeneous datasets with a single, consistent API.

Why StandardE2E?ΒΆ

Training autonomous driving models on diverse datasets is challenging:

  • Different data formats: Each dataset has unique file structures and APIs

  • Inconsistent modalities: Camera formats, LiDAR representations, and annotation styles vary

  • Complex preprocessing: Converting raw data to model-ready format requires dataset-specific code

  • Limited flexibility: Switching or combining datasets means rewriting data pipelines

StandardE2E solves these problems by providing:

🎯 One Format to Rule Them All

All datasets are converted to a unified TransformedFrameData representation with consistent modality keys and metadata.

⚑ Efficient Indexing

Parquet-based indexes enable fast filtering, sampling, and frame lookup without loading heavy scene data.

πŸ”Œ Extensible by Adding New Datasets

Once processed, new datasets follow the same API and provide consistent data structures.

🎨 Flexible Augmentation

Chain frame-level augmentations that work across all datasets. Regime-aware (train/val/test).

πŸ”§ Parametrizable Pipelines

Configure what to load programmatically or via YAML config files. You can fetch required frames and specific modalities for each of them.

ArchitectureΒΆ

StandardE2E Data Flow

StandardE2E consists of two parts:

PreprocessingΒΆ

Preprocessing converts raw datasets into a unified, training-ready on-disk format. It is where dataset-specific processing happens, and where reusable user-defined adapters are applied.

Preprocessing uses a two-stage data transformation:

  1. Raw dataset β†’ StandardFrameData (dataset-agnostic intermediate format)

    Handled by dataset-specific processors and kept free of user-defined transformations. Basically, StandardFrameData keeps a raw frame data in a consistent structure. See adding_new_dataset.md to learn how to add new datasets.

  2. StandardFrameData β†’ AbstractAdapter β†’ TransformedFrameData.

    Adapters apply user-defined transformations (image resizing, panorama projection, normalization, etc.), making the format efficient and training-ready. See intro_tutorial.ipynb and creating_custom_adapter.ipynb to learn more about adapters.

  • Processing stage also produces a Parquet index for fast frame lookup, filtering and

    storing extra metadata.

  • As the final stage SegmentContextAggregator may be applied to aggregate

    segment-level context into frame-level data (e.g., current position into trajectory).

TrainingΒΆ

Training is where you load the preprocessed data for model training or evaluation. The main entry point is UnifiedE2EDataset, which accepts an index table from index.parquet (or from multiple files for combined dataset) and loads frames from disk. It also handles the following functionalities: