Supported DatasetsΒΆ

Each row below lists the modalities the dataset processor currently emits into the unified format. βœ“ means the modality is wired end-to-end through preprocessing and the on-disk TransformedFrameData schema; β€” means the dataset doesn’t ship that modality (and it’s surfaced as the corresponding default value via ModalityDefaults).

Dataset

Cameras

LiDAR

HD map (BEV)

3D detections

Driving command

Preference trajectory

Waymo Open E2E

βœ“ (8 ring cameras)

β€”

β€”

β€”

βœ“

βœ“

Waymo Open Perception

βœ“ (5 cameras)

βœ“ (top + side, ego frame)

βœ“

βœ“

β€”

β€”

Argoverse 2 Sensor

βœ“ (7 ring cameras)

βœ“ (merged sweep, ego frame)

βœ“

βœ“

β€”

β€”

Argoverse 2 Lidar

β€”

βœ“ (merged sweep, ego frame)

βœ“

β€”

β€”

β€”

NAVSIM (OpenScene-v1.1)

βœ“ (8 cameras: front/leftΓ—3/rightΓ—3/rear)

βœ“ (merged sweep, ego frame)

βœ“ (via nuPlan map.gpkg β†’ unified taxonomy; lane boundaries carry no paint info, since nuPlan doesn’t store paint type)

βœ“

βœ“ (4-class one-hot β†’ Intent)

β€”

WayveScenes101

βœ“ (5 fisheye: forward + side arc)

βœ“ (COLMAP SfM, ego frame) [1]

β€”

β€”

β€”

β€”

comma2k19

βœ“ (1 forward: comma EON, 1164Γ—874 pinhole) [2]

β€”

β€”

β€”

β€”

β€”

All datasets also emit the ego past/future trajectory (from each dataset’s poses, via the segment-context aggregator) regardless of the columns above.

Note

comma2k19 is high-volume β€” 20 Hz Γ— ~2000 one-minute segments β‰ˆ 2.4 M frames (~2 TB at the native 1164Γ—874 resolution). Two converter knobs bound the output size and processing time: --frame_stride N keeps every N-th frame (1 = full 20 Hz; e.g. --frame_stride 4 β‰ˆ 5 Hz), and the cameras_identity_adapter’s max_size param downscales each frame so its longest side is at most that many px (intrinsics scaled to match).

How datasets are addedΒΆ

See Adding New Datasets Guide for the full processor β†’ adapter β†’ aggregator pipeline a new dataset has to plug into.