Supported DatasetsΒΆ
Each row below lists the modalities the dataset processor currently emits
into the unified format. β means the modality is wired end-to-end
through preprocessing and the on-disk TransformedFrameData schema;
β means the dataset doesnβt ship that modality (and itβs surfaced as
the corresponding default value via
ModalityDefaults).
Dataset |
Cameras |
LiDAR |
HD map (BEV) |
3D detections |
Driving command |
Preference trajectory |
|---|---|---|---|---|---|---|
β (8 ring cameras) |
β |
β |
β |
β |
β |
|
β (5 cameras) |
β (top + side, ego frame) |
β |
β |
β |
β |
|
β (7 ring cameras) |
β (merged sweep, ego frame) |
β |
β |
β |
β |
|
β |
β (merged sweep, ego frame) |
β |
β |
β |
β |
|
NAVSIM (OpenScene-v1.1) |
β (8 cameras: front/leftΓ3/rightΓ3/rear) |
β (merged sweep, ego frame) |
β (via nuPlan |
β |
β (4-class one-hot β |
β |
β (5 fisheye: forward + side arc) |
β (COLMAP SfM, ego frame) [1] |
β |
β |
β |
β |
|
β (1 forward: comma EON, 1164Γ874 pinhole) [2] |
β |
β |
β |
β |
β |
All datasets also emit the ego past/future trajectory (from each datasetβs poses, via the segment-context aggregator) regardless of the columns above.
Note
comma2k19 is high-volume β 20 Hz Γ ~2000 one-minute segments β 2.4 M
frames (~2 TB at the native 1164Γ874 resolution). Two converter knobs bound
the output size and processing time: --frame_stride N keeps every
N-th frame (1 = full 20 Hz; e.g. --frame_stride 4 β 5 Hz), and
the cameras_identity_adapterβs max_size param downscales each
frame so its longest side is at most that many px (intrinsics scaled to
match).
How datasets are addedΒΆ
See Adding New Datasets Guide for the full processor β adapter β aggregator pipeline a new dataset has to plug into.