
DIO: Decomposable Implicit 4D Occupancy-Flow World Model
Christopher Diehl*, Quinlan Sykora*, Ben Agro, Thomas Gilles, Sergio Casas, Raquel Urtasun
ICCV 2023
Siva Manivasagam*, Ioan Andrei Bârsan*, Jingkang Wang, Ze Yang, Raquel Urtasun
We study the impact of pulse effects, scanning effects, and asset quality on LiDAR simulation realism. Below, we depict one example of a pulse effect domain gap, caused by not modeling secondary LiDAR returns. In the top row we show this can cause the object detector to fail, resulting in an unsafe autonomy plan. The bottom row depicts the front-camera view, for reference only, followed by the relevant LiDAR: original simulation, added multi-echoes, and real. We denote multi-echoes re-added by the middle method in orange. Subtle differences in the area highlighted with the arrow stem from weak returns on truck's rear wheels, impacting the domain gap.
Play with sound.
We propose a novel approach for evaluating the domain gap of a sensor simulator, leveraging a paired-scenario setting where we use simulation to re-create a digital twin of the real world, and use it to re-simulate the original log.
We conduct the first analysis of what aspects of LiDAR simulation are critical to simulate with high fidelity to ensure close matching performance of the autonomy system between the simulator and the real world.
We define a taxonomy of LiDAR simulation effects, which we then use to measure their importance on the performance of an autonomous system. We perform the analysis by leveraging paired simulation data, and defining ways of transfering specific effects from the real to the simulated data, which we dub oracles .
In real LiDAR scans, some laser pulses do not return a reading. Accumulating points over many frames to build assets for a simulator results in dense reconstructions. This is desirable as it ensures assets look realistic from novel viewpoints. However, naive raycasting against these dense assets without modeling the probability that a pulse won't be returned produces unrealistic, overly dense point clouds.
Time-of-Flight LiDARs compute their point clouds by estimating the time it takes each laser pulse to return. The returning pulse is typically identified within the sensor itself, either in hardware or firmware, using signal processing. A pulse may hit multiple objects and thus produce multiple reflections or echoes. This happens very frequently with foliage, but it is common in urban driving, for example around object borders. Failure to model these additional returns in simulation reduces the realism of point clouds.
Spurious returns in LiDAR can occur due to multi-path, blooming, beam divergence, and volume scattering effects. Multi-path is similar to multi-echo, but the returned pulse arrives from a different angle than what it was transmitted at, producing ghosting artifacts. Blooming can be caused by objects which are much more reflective than their surroundings, such as traffic signs. Beam divergence reflects to the gradual widening of the laser beam as it gets farther away from the emitter.
Small actor inaccurate geometry. Certain small actors may yield imprecise LiDAR readings in simulation, which can improve realism when corrected.
Noisy points can occur where the peak in the waveform is ambiguous, resulting in inaccurate calculation of return time t . This can occur for thin structures, retro-reflectors, and inherent aleatoric noise in the real world.
The LiDAR intrinsics specify the calibrated azimuth and elevation angles for each laser. This affects the pulse pattern and alters the point cloud density and the sensor field of view. Many classic simulators such as CARLA rely on generic equidistant beams, but most real LiDARs use more sophisticated scanning patterns.
Spinning LiDARs gather measurements over time, typically taking 100ms to complete a 360° scan of a scene. If the SDV is moving during this time, the scan will be distorted as the pose of the sensor changes during the scan. Similarly, the motion of dynamic actors can blur the generated LiDAR point cloud, and can produce changes in the location of where the actor is observed in the sweep. Depending on the relative direction of movement, actors can appear compressed or elongated in the scene.
To measure the domain gap between a simulator and the real world for the autonomy system under test, we propose a paired-scenario setting. Given an initial real scenario, we construct its digital twin (e.g., same map location and actor placement) in the simulator. This enables comparing the simulated LiDAR directly with the real LiDAR in a pair-wise fashion. The scenario parameters, such as actor positions, can be extracted through human annotation or automatic offline-labelling. We then can compose virtual geometry assets to match the scene and simulate the LiDAR for the autonomy system under test. "Similarity" is defined as whether the autonomy performs the same on the original and re-simulated LiDAR.
Digital Twin Virtual World. Virtual worlds for multi-LiDAR simulation, with examples of resulting rendered LiDAR on the right.
Here are the key insights uncovered by our analysis in this novel paired simulation setting. Please refer to the paper itself for additional details on the metrics and the system under test.
Table 1. LiDAR Pulse Phenomena. Enhancing Base-LiDAR with ray propagation effects such as unreturned pulses ( DropP oints), multi-path ( AddE chos), spurious points ( AddP oints), and noisy points ( ModP oints).
We find that secondary returns from the long-range LiDAR, which only accounts for ∼5% of the total input points, substantially improve domain gap metrics. Qualitatively, AddEchoes enables the simulated LiDAR to model multiple echoes and alters object detection, enabling improved similarity to the real LiDAR, including false positives, ensuring agreement with downstream planning. AddPoints alone improves detection agreement and prediction while not reducing planning discrepancy. It helps especially for better detection agreement at long range, suggesting modelling spurious points may matter in these regions. We also find that, while on average, performing DropPoints alone harms domain gap (row 2), certain situations show it better matching motion planning outputs w.r.t real LiDAR (Fig. 7, left). Furthermore, pairing it with ModifyPoints(δlo,δhi) (row 8) results in the best realism gain over all oracle policies. This indicates that better geometry reconstruction of the actors and scene in conjunction with better material modelling are key to better realism.
Table 2. Scanning LiDAR Effects. Analyzing the the domain gap for calibrated intrinsics, rolling shutter (RS) and motion blur (MB).
We observe increased domain gap with naive intrinsics, indicating that autonomy is not invariant to the LiDAR scanning pattern, which causes certain spatial regions to have different point density between the simulated and real LiDAR. Here, "Naive” intrinsics correspond to linearly spaced laser elevations, as real LiDARs employ uneven patterns to maximize long range coverage. More significantly, we find that modelling actor motion during the LiDAR sweep (rolling shutter) is critical to ensure matching autonomy outputs. Surprisingly, we find toggling ego rolling-shutter has fluctuations in the domain gap, reducing domain gap on its own, but slightly harming with motion blur. We conjecture this is because the autonomy under test consumes motion-compensated LiDAR, a standard practice in most benchmarks
Table 3. Virtual World Creation. Effect of different virtual world creation approaches.
For our baseline, we follow prior approaches, and perform LiDAR aggregation on labelled collected logs to build surfel asset meshes for the actors and static background. While faithfully matching the observations, surfel meshes may suffer from topological problems, and their construction is unable to account for sensor noise.
Heuristic Road-only Meshing. Using RANSAC plane fitting to identify ground points, we then create a road-only mesh.
Neural Reconstruction We adopt a state-of-the-art neural reconstruction approach for large-scale scenes, and extract geometry using marching cubes.
Only CAD assets. Off-the-shelf car and motorcycle meshes purchased from TurboSquid.
CAD assets mixed with curated surfel assets. We mix our CAD assets with a manually curated set of 942 reconstructed actor surfel meshes with clean geometry.
For additional details, experiments, and results, please refer to our paper or the supplementary material .
@inproceedings{manivasagam2023towards,
title = {Towards Zero Domain Gap: A Comprehensive Study of Realistic LiDAR Simulation for Autonomy Testing },
author = {Manivasagam, Sivabalan and Bârsan, Ioan Andrei and Wang, Jingkang and Yang, Ze and Urtasun, Raquel},
booktitle = {{ICCV}},
year = {2023},
}
Christopher Diehl*, Quinlan Sykora*, Ben Agro, Thomas Gilles, Sergio Casas, Raquel Urtasun
Ben Agro, Sergio Casas, Patrick Wang, Thomas Gilles, Raquel Urtasun
Ze Yang, Jingkang Wang, Haowei Zhang, Sivabalan Manivasagam, Yun Chen, Raquel Urtasun
Yun Chen*, Matthew Haines*十, Jingkang Wang, Krzysztof Baron-Lis, Sivabalan Manivasagam, Ze Yang, Raquel Urtasun
UniCal: Unified Neural Sensor Calibration
Chris Zhang, Sourav Biswas, Kelvin Wong, Kion Fallah, Lunjun Zhang, Dian Chen, Sergio Casas, Raquel Urtasun
Yun Chen*, Jingkang Wang*, Ze Yang, Sivabalan Manivasagam, Raquel Urtasun
Sergio Casas*, Ben Agro*, Jiageng Mao*十, Thomas Gilles, Alexander Cui十, Thomas Li, Raquel Urtasun
Sergio Casas*, Ben Agro*, Jiageng Mao*十, Thomas Gilles, Alexander Cui十, Thomas Li, Raquel Urtasun
Jack Lu†*, Kelvin Wong*, Chris Zhang, Simon Suo, Raquel Urtasun