Towards Zero Domain Gap: A Comprehensive Study of Realistic LiDAR Simulation for Autonomy Testing
Abstract
Testing the full autonomy system in simulation is the safest and most scalable way to evaluate autonomous vehicle performance before deployment. This requires simulating sensor inputs such as LiDAR. To be effective, it is essential that the simulation has low domain gap with the real world. That is, the autonomy system in simulation should perform exactly the same way it would in the real world for the same scenario. To date, there has been limited analysis into what aspects of LiDAR phenomena affect autonomy performance. It is also difficult to evaluate the domain gap of existing LiDAR simulators, as they operate on fully synthetic scenes. In this paper, we propose a novel “paired-scenario” approach to evaluating the domain gap of a LiDAR simulator by reconstructing digital twins of real world scenarios. We can then simulate LiDAR in the scene and compare it to the real LiDAR. We leverage this setting to analyze what aspects of LiDAR simulation, such as pulse phenomena, scanning effects, and asset quality, affect the domain gap with respect to the autonomy system, including perception, prediction, and motion planning, and analyze how modifications to the simulated LiDAR influence each part. We identify key aspects that are important to model, such as motion blur, material reflectance, and the accurate geometric reconstruction of traffic participants. This helps provide research directions for improving LiDAR simulation and autonomy robustness to these effects.
Overview
We study the impact of pulse effects, scanning effects, and asset quality on LiDAR simulation realism. Below, we depict one example of a pulse effect domain gap, caused by not modeling secondary LiDAR returns. In the top row we show this can cause the object detector to fail, resulting in an unsafe autonomy plan.
The bottom row depicts the front-camera view, for reference only, followed by the relevant LiDAR: original simulation, added multi-echoes, and real. We denote multi-echoes re-added by the middle method in orange. Subtle differences in the area highlighted with the arrow stem from weak returns on truck’s rear wheels, impacting the domain gap.
Video
Play with sound.
Motivation
Accurately testing the behavior of robots such as self-driving vehicles (SDVs) is of paramount importance to ensure their safe deployment in the real world. The safest, most scalable and sustainable way to test the autonomy system is through simulation. To assess the safety of the full system, it is critical to evaluate the complete autonomy stack in such a simulator. This is a must, as small changes in one sub-component (e.g., a missed detection) can cause a chain reaction of downstream effects that significantly alter the outcome, and might result in a safety hazard.
To evaluate the full autonomy, we must simulate all the inputs to the system. This requires high fidelity sensor simulation with low domain gap with respect to the real world. That is, the performance of the autonomy system in simulation on all scenarios should match the real world performance.
Despite the importance of LiDAR simulation, there has been little investigation into what aspects of LiDAR simulation matter for realism.
Therefore, in this paper:
- We propose a novel approach for evaluating the domain gap of a sensor simulator, leveraging a paired-scenario setting where we use simulation to re-create a digital twin of the real world, and use it to re-simulate the original log.
- We conduct the first analysis of what aspects of LiDAR simulation are critical to simulate with high fidelity to ensure close matching performance of the autonomy system between the simulator and the real world.
Studied Phenomena
We define a taxonomy of LiDAR simulation effects, which we then use to measure their importance on the performance of an autonomous system. We perform the analysis by leveraging paired simulation data, and defining ways of transfering specific effects from the real to the simulated data, which we dub oracles.
1. Unreturned Pulses
In real LiDAR scans, some laser pulses do not return a reading. Accumulating points over many frames to build assets for a simulator results in dense reconstructions. This is desirable as it ensures assets look realistic from novel viewpoints. However, naive raycasting against these dense assets without modeling the probability that a pulse won’t be returned produces unrealistic, overly dense point clouds.
2. Multiple Echoes
Time-of-Flight LiDARs compute their point clouds by estimating the time it takes each laser pulse to return. The returning pulse is typically identified within the sensor itself, either in hardware or firmware, using signal processing.
A pulse may hit multiple objects and thus produce multiple reflections or echoes. This happens very frequently with foliage, but it is common in urban driving, for example around object borders. Failure to model these additional returns in simulation reduces the realism of point clouds.
3. Spurious Returns
Spurious returns in LiDAR can occur due to multi-path, blooming, beam divergence, and volume scattering effects. Multi-path is similar to multi-echo, but the returned pulse arrives from a different angle than what it was transmitted at, producing ghosting artifacts. Blooming can be caused by objects which are much more reflective than their surroundings, such as traffic signs. Beam divergence reflects to the gradual widening of the laser beam as it gets farther away from the emitter.
4. Noisy Points
Noisy points can occur where the peak in the waveform is ambiguous, resulting in inaccurate calculation of return time t. This can occur for thin structures, retro-reflectors, and inherent aleatoric noise in the real world.
5. Spinning Sensor Ray Generation
The LiDAR intrinsics specify the calibrated azimuth and elevation angles for each laser. This affects the pulse pattern and alters the point cloud density and the sensor field of view. Many classic simulators such as CARLA rely on generic equidistant beams, but most real LiDARs use more sophisticated scanning patterns.
6. Rolling Shutter & Motion Blur
Spinning LiDARs gather measurements over time, typically taking 100ms to complete a 360° scan of a scene. If the SDV is moving during this time, the scan will be distorted as the pose of the sensor changes during the scan.
Similarly, the motion of dynamic actors can blur the generated LiDAR point cloud, and can produce changes in the location of where the actor is observed in the sweep. Depending on the relative direction of movement, actors can appear compressed or elongated in the scene.
Measuring Domain Gap
Paired Scenario Setting
To measure the domain gap between a simulator and the real world for the autonomy system under test, we propose a paired-scenario setting. Given an initial real scenario, we construct its digital twin (e.g., same map location and actor placement) in the simulator. This enables comparing the simulated LiDAR directly with the real LiDAR in a pair-wise fashion.
The scenario parameters, such as actor positions, can be extracted through human annotation or automatic offline-labelling. We then can compose virtual geometry assets to match the scene and simulate the LiDAR for the autonomy system under test.
“Similarity” is defined as whether the autonomy performs the same on the original and re-simulated LiDAR.
What Matters for LiDAR Realism?
Here are the key insights uncovered by our analysis in this novel paired simulation setting. Please refer to the paper itself for additional details on the metrics and the system under test.
LiDAR Pulse Phenomena
We find that secondary returns from the long-range LiDAR, which only accounts for ∼5% of the total input points, substantially improve domain gap metrics. Qualitatively, AddEchoes enables the simulated LiDAR to model multiple echoes and alters object detection, enabling improved similarity to the real LiDAR, including false positives, ensuring agreement with downstream planning.
AddPoints alone improves detection agreement and prediction while not reducing planning discrepancy. It helps especially for better detection agreement at long range, suggesting modelling spurious points may matter in these regions.
We also find that, while on average, performing DropPoints alone harms domain gap (row 2), certain situations show it better matching motion planning outputs w.r.t real LiDAR (Fig. 7, left). Furthermore, pairing it with ModifyPoints(δlo,δhi) (row 8) results in the best realism gain over all oracle policies. This indicates that better geometry reconstruction of the actors and scene in conjunction with better material modelling are key to better realism.
Scanning LiDAR Effects
We observe increased domain gap with naive intrinsics, indicating that autonomy is not invariant to the LiDAR scanning pattern, which causes certain spatial regions to have different point density between the simulated and real LiDAR. Here, “Naive” intrinsics correspond to linearly spaced laser elevations, as real LiDARs employ uneven patterns to maximize long range coverage.
More significantly, we find that modelling actor motion during the LiDAR sweep (rolling shutter) is critical to ensure matching autonomy outputs.
Surprisingly, we find toggling ego rolling-shutter has fluctuations in the domain gap, reducing domain gap on its own, but slightly harming with motion blur. We conjecture this is because the autonomy under test consumes motion-compensated LiDAR, a standard practice in most benchmarks
Virtual World Creation
Baseline
For our baseline, we follow prior approaches, and perform LiDAR aggregation on labelled collected logs to build surfel asset meshes for the actors and static background. While faithfully matching the observations, surfel meshes may suffer from topological problems, and their construction is unable to account for sensor noise.
Background Modelling
Besides surfel aggregation, we explored two other approaches for background creation:
- Heuristic Road-only Meshing. Using RANSAC plane fitting to identify ground points, we then create a road-only mesh.
- Neural Reconstruction We adopt a state-of-the-art neural reconstruction approach for large-scale scenes, and extract geometry using marching cubes.
Table 3 shows that both approaches reduce the domain gap when compared to the baseline, with the neural mesh performing overall better than the road-only mesh.
Foreground Modelling
To model the foreground actors, we compare two approaches:
- Only CAD assets. Off-the-shelf car and motorcycle meshes purchased from TurboSquid.
- CAD assets mixed with curated surfel assets. We mix our CAD assets with a manually curated set of 942 reconstructed actor surfel meshes with clean geometry.
As shown in Table 3, using only CAD models lead to a larger domain gap compared to real-world reconstruction. We find using a combination of CAD models and surfel assets, despite having higher perception and prediction domain gap, improves the planning discrepancy. We hypothesize this might be due to better modelling of the actors of interest that affect the motion planning, such as actors directly in front or behind the self-driving vehicle.
For additional details, experiments, and results, please refer to our paper or the supplementary material.
BibTeX
@inproceedings{manivasagam2023towards,
title = {Towards Zero Domain Gap: A Comprehensive Study of Realistic LiDAR Simulation for Autonomy Testing },
author = {Manivasagam, Sivabalan and Bârsan, Ioan Andrei and Wang, Jingkang and Yang, Ze and Urtasun, Raquel},
booktitle = {{ICCV}},
year = {2023},
}