CADSim: Robust and Scalable in-the-wild 3D Reconstruction for Controllable Sensor Simulation

Jingkang Wang, Siva Manivasagam, Yun Chen, Ze Yang , Ioan Andrei Bârsan, Anqi Joyce Yang, Wei–Chiu Ma, Raquel Urtasun

Conference: CoRL 2022

Category: Digital Twins, Sensor Simulation, Simulation

Video

Abstract

Realistic simulation is key to enabling safe and scalable development of self-driving vehicles. A core component is simulating the sensors so that the entire autonomy system can be tested in simulation. Sensor simulation involves modeling traffic participants, such as vehicles, with high-quality appearance and articulated geometry, and rendering them in real-time. The self-driving industry has employed artists to build these assets. However, this is expensive, slow, and may not reflect reality. Instead, reconstructing assets automatically from sensor data collected in the wild would provide a better path to generating a diverse and large set that provides good real-world coverage. However, current reconstruction approaches struggle on in-the-wild sensor data, due to its sparsity and noise. To tackle these issues, we present CADSim which combines part-aware object-class priors via a small set of CAD models with differentiable rendering to automatically reconstruct vehicle geometry, including articulated wheels, with high-quality appearance. Our experiments show our approach recovers more accurate shape from sparse data compared to existing approaches. Importantly, it also trains and renders efficiently. We demonstrate our reconstructed vehicles in a wide range of applications, including accurate testing of autonomy perception systems.

Overview

CADSim recovers shape, appearance and illumination in a robust and scalable way from sensor observations (LiDAR and cameras). The reconstructed assets are high fidelity, animatable, and compatible with graphics engines, enabling efficient, realistic and controllable simulation.

Video

Method

We build our model based on the observation that existing CAD models are equipped with detailed geometry and animatable parts, which can be used as a prior during the reconstruction process. Towards this goal, we propose an energy-based formulation that exploits CAD models, as well as visual and geometric cues from images and/or LiDAR for 3D reconstruction.

Our vehicle model representation is designed to represent the vehicle in a controllable manner with human priors. We separate the vehicle into a body and four wheels, with wheels constrained with a fixed radius and rotation constrained to the front wheel axle. This representation allows for robust optimization to the sensor data, while still enabling animatable simulation (e.g., wheels rotating while driving).

scale (tire radius and thickness)
front wheel rotation and translation
back wheel translation
symmetry

(physically plausiable, part-aware)

Our approach builds upon recent success on differentiable rendering: we leverage a differentiable renderer that takes as input variables such as the sensor pose and our textured mesh representation, and outputs a realistic simulation of the object. We design an energy function with complementary terms which measure the geometry and appearance agreement between the observations and estimations, while regularizing the shape, and appearance to obey known priors:

While a CAD model provides a useful prior for mesh initialization, it is likely that a single template mesh may not sufficiently cover a wide range of objects. We thus resort to a CAD model library that consists of various vehicle types. We apply principal component analysis (PCA) on vertex coordinates to obtain a shared low-dimensional code $z$ .

Results

CADSim recovers more accurate shapes, material and lighting from sparse observations compared to state-of-the-art (SoTA) approaches.

We show more qualitative comparisons with SoTA approaches. CADsim’s rendered mesh is shown in the bottom right, and the ground truth is shown on the left. Other methods are shown in the remaining panels

Below are the results for all test views (front-left camera):

CAD priors are crucial for achieving high-fidelity shapes in non-convex optimization. Notice that CADSim is the only method with distinct and separable wheels for animation.

CADSim results in fast reconstruction and real-time rendering and produces the best photorealism metrics compared to SoTA approaches.

3D Reconstruction

We can reconstruct 360° vehicle assets from partial observations (left camera). For each example, the vehicle to be reconstructed is annotated with red bounding boxes. The reconstructed mesh is shown on the right.

Robust and Scalable Reconstruction

We can reconstruct 360° vehicle assets from partial observations at scale. Although the sensor observations are sparse and noisy for most vehicles, CADSim still generates complete assets robustly.

Results on Non-vehicle Objects

CADSim can also be extended to reconstruct non-vehicle objects, as long as the objects can be separated into different rigged parts. In the following, we provide reconstruction examples on motorcycles and traffic cones. Our strong initialization combined with the differentiable rendering enables high quality asset generation.

CADSim Applications

We now showcase applying CADSim for accurate camera simulation to evaluate perception models, and to perform realistic vehicle insertion. CADSim also supports texture transfer naturally, allowing us to easily expand the asset library.

Actor Insertion

Here we show results of inserting CADSim assets into real camera scenes. The left two columns are the original camera and LiDAR data. The right two columns are camera and LiDAR data with CADSim inserted actors. Our insertion results are realistic, temporally consistent, and multi-sensor consistent.

Compared to directly inserting CAD models with unrealistic texture, CADsim has more realistic insertion results:

We can leverage CADSim to insert actors into the scenes at scale, creating a wide variety of new scenarios from the original sensor videos.

Scene Manipulation

We can manipulate in 3D the position and rotation of the added car. We can generate the Lidar point cloud (on the left) and the camera image (on the right) for the modified scene. Both simulated sensor data look realistic. Notice the wheels spinning when moving forward and backward, and the wheels rotating to model turning.

Safety-Critical Scenario Generation

We generate a safety-critical scenario and show realistic simulation for an actor aggressively turning right into our lane. The simulated camera and lidar data are blended seamlessly into the original scenario, creating a more interesting long-tail scenario.

We insert a moving vehicle that changes two lanes at once aggressively. The occlusion and actor movement is physically plausible.

Texture Transfer and Synthesis

Our approach can align textures across different vehicle shapes, enabling texture transfer to create new asset variations. We demonstrate the texture transfer across multiple actors in the real world.

We can swap the vehicle textures in the real-world video. we choose three nearby actors and transfer the texture from one actor to the other ones. Our assets are vertex-aligned with high-quality part correspondence across different shapes, allowing us for realistic and seamless simulation.

Below are some snapshots of the above video for better illustration:

Limitations

While CADSim can efficiently reconstruct assets at scale, we observe the following limitations:

Fixed topology: CADSim’s main assumption is that it requires CAD models for the object class of interest. We note that CAD models are readily available for most object classes, and that our approach only requires encoding semantic priors for a single CAD asset, as our energy-model optimization allows for transfer of these priors to other assets of the same class
Limited in-painting capacity: Although we apply local smoothness and symmetry priors in the appearance energy terms, CADSim still cannot hallucinate missing pixels if the coverage is too limited.
Relies on segmentation masks, LiDAR points and rough camera parameters:
CADsim leverages segmentation and LiDAR data to identify the object boundary and estimate accurate geometry. This data is typically available in mobile sensor platforms for self-driving.

Conclusion

In this paper, we proposed to leverage in-the-wild camera and LiDAR data to reconstruct objects such as vehicles. Towards this goal, we designed CADSim, which leverages geometry and semantic cues from CAD models with differentiable rendering, to generate meshes with high quality geometry and appearance. These cues also enable our approach to generate articulated and editable meshes, enabling endless creation of new shapes, textures, and animations for simulation.

BibTeX

  @inproceedings{wang2022cadsim,
    title={CADSim: Robust and Scalable in-the-wild 3D Reconstruction for Controllable Sensor Simulation},
    author={Jingkang Wang and Sivabalan Manivasagam and Yun Chen and Ze Yang and Ioan Andrei B{\^a}rsan and Anqi Joyce Yang and Wei-Chiu Ma and Raquel Urtasun},
    booktitle={6th Annual Conference on Robot Learning},
    year={2022},
    url={https://openreview.net/forum?id=Mp3Y5jd7rnW}
  }

CADSim: Robust and Scalable in-the-wild 3D Reconstruction for Controllable Sensor Simulation