Av2LidarDatasetConverter¶

class standard_e2e.caching.src_datasets.av2_lidar.Av2LidarDatasetConverter(source_processor, input_path, split, num_workers=0, do_parallel_processing=True, arguments=None)[source]¶

Bases: SourceDatasetConverter

Iterates AV2 lidar logs frame-by-frame.

The AV2 lidar split has the same on-disk layout as AV2 sensor — a flat directory of per-log folders under <input_path>/<split>/<log_id>/, with sensors/lidar/<timestamp_ns>.feather sweeps and a map/ subdirectory — minus the camera and annotation files. One frame = one lidar sweep timestamp; one log = one segment. Yielding tuples ordered first by log and then by timestamp keeps the processor’s per-log cache warm across each multiprocessing chunk.

With STANDARD_E2E_DEBUG=true only the first log is processed.

Parameters:
convert()¶

Convert all frames then run any configured context aggregators.

Return type:

None

property dataset_name: str¶

Return the name of the dataset.

classmethod get_arg_parser()¶

Return an argument parser for the converter.

property max_workers: int | None¶

Optional cap on parallel-pool size; None means no cap.

Used by datasets where pool throughput plateaus or regresses past a certain worker count – typically because the processor carries large state (e.g. a prescanned HD-map cache) and Pool’s per-task dispatch overhead grows with worker count. Subclasses whose processors are small can leave this at None.

property multiprocessing_start_method: str¶

Start method for the worker pool.

Default "spawn" is the conservative choice: TensorFlow and OpenCV both keep global thread / mutex state that fork() inherits in a deadlock-prone way (typically before the first frame completes). Spawn pays a per-worker import cost (~5 s per worker, dominated by TensorFlow) but is the safe pattern for any worker that may run TF or cv2 work post-fork.

Subclasses whose worker hot path is fully TF-free (no tf.io.decode_image, no frame_utils.* calls, etc.) may override to "fork" to avoid the spawn import tax. This is a very large speedup on small / DEBUG runs and a meaningful one on full splits.