Supported Datasets¶

Each row below lists the modalities the dataset processor currently emits into the unified format. ✓ means the modality is wired end-to-end through preprocessing and the on-disk TransformedFrameData schema; — means the dataset doesn’t ship that modality (and it’s surfaced as the corresponding default value via ModalityDefaults).

Dataset	Cameras	LiDAR	HD map (BEV)	3D detections	Driving command	Preference trajectory
Waymo Open E2E	✓ (8 ring cameras)	—	—	—	✓	✓
Waymo Open Perception	✓ (5 cameras)	✓ (top + side, ego frame)	✓	✓	—	—
Argoverse 2 Sensor	✓ (7 ring cameras)	✓ (merged sweep, ego frame)	✓	✓	—	—
Argoverse 2 Lidar	—	✓ (merged sweep, ego frame)	✓	—	—	—
NAVSIM (OpenScene-v1.1)	✓ (8 cameras: front/left×3/right×3/rear)	✓ (merged sweep, ego frame)	✓ (via nuPlan `map.gpkg` → unified taxonomy; lane boundaries carry no paint info, since nuPlan doesn’t store paint type)	✓	✓ (4-class one-hot → `Intent`)	—
WayveScenes101	✓ (5 fisheye: forward + side arc)	✓ (COLMAP SfM, ego frame) [1]	—	—	—	—
comma2k19	✓ (1 forward: comma EON, 1164×874 pinhole) [2]	—	—	—	—	—

All datasets also emit the ego past/future trajectory (from each dataset’s poses, via the segment-context aggregator) regardless of the columns above.

Note

comma2k19 is high-volume — 20 Hz × ~2000 one-minute segments ≈ 2.4 M frames (~2 TB at the native 1164×874 resolution). Two converter knobs bound the output size and processing time: --frame_stride N keeps every N-th frame (1 = full 20 Hz; e.g. --frame_stride 4 ≈ 5 Hz), and the cameras_identity_adapter’s max_size param downscales each frame so its longest side is at most that many px (intrinsics scaled to match).

How datasets are added¶

See Adding New Datasets Guide for the full processor → adapter → aggregator pipeline a new dataset has to plug into.