Av2SensorDatasetConverter¶
- class standard_e2e.caching.src_datasets.av2_sensor.Av2SensorDatasetConverter(source_processor, input_path, split, num_workers=0, do_parallel_processing=True, arguments=None)[source]¶
Bases:
SourceDatasetConverterIterates AV2 sensor logs frame-by-frame.
The AV2 sensor split is a flat directory of per-log folders under
<input_path>/<split>/<log_id>/. One frame = one lidar sweep timestamp. Yielding(log_dir, sweep_ts_ns)ordered first by log and then by timestamp keeps consecutive frames inside the same log, which lets the processor’s per-log cache stay warm across each multiprocessing chunk.With
STANDARD_E2E_DEBUG=trueonly the first log is processed, matching the convention used by the Waymo converters.- Parameters:
- classmethod get_arg_parser()¶
Return an argument parser for the converter.
- property max_workers: int | None¶
Optional cap on parallel-pool size;
Nonemeans no cap.Used by datasets where pool throughput plateaus or regresses past a certain worker count – typically because the processor carries large state (e.g. a prescanned HD-map cache) and
Pool’s per-task dispatch overhead grows with worker count. Subclasses whose processors are small can leave this atNone.
- property multiprocessing_start_method: str¶
Start method for the worker pool.
Default
"spawn"is the conservative choice: TensorFlow and OpenCV both keep global thread / mutex state thatfork()inherits in a deadlock-prone way (typically before the first frame completes). Spawn pays a per-worker import cost (~5 s per worker, dominated by TensorFlow) but is the safe pattern for any worker that may run TF or cv2 work post-fork.Subclasses whose worker hot path is fully TF-free (no
tf.io.decode_image, noframe_utils.*calls, etc.) may override to"fork"to avoid the spawn import tax. This is a very large speedup on small / DEBUG runs and a meaningful one on full splits.