UniCal: Unified Neural Sensor Calibration
UniCal: Our method takes collected outdoor sensor data from multi-sensor robots and automatically calibrates the sensor extrinsics. Top: LiDAR-Camera and LiDAR-LiDAR alignment on collected data with uncalibrated extrinsics. Bottom: Sensor alignment with our optimized calibration.
Abstract
Self-driving vehicles (SDVs) require accurate calibration of LiDARs and cameras to fuse sensor data accurately for autonomy. Traditional calibration methods typically leverage fiducials captured in a controlled and structured scene and compute correspondences to optimize over. These approaches are costly and require substantial infrastructure and operations, making it challenging to scale for vehicle fleets. In this work, we propose UniCal, a unified framework for effortlessly calibrating SDVs equipped with multiple LiDARs and cameras. Our approach is built upon a differentiable scene representation capable of rendering multi-view geometrically and photometrically consistent sensor observations. We jointly learn the sensor calibration and the underlying scene representation through differentiable volume rendering, utilizing outdoor sensor data without the need for specific calibration fiducials. This “drive-and-calibrate” approach significantly reduces costs and operational overhead compared to existing calibration systems, enabling efficient calibration for large SDV fleets at scale. To ensure geometric consistency across observations from different sensors, we introduce a novel surface alignment loss that combines feature-based registration with neural rendering. Comprehensive evaluations on multiple datasets demonstrate that UniCal outperforms or matches the accuracy of existing calibration approaches while being more efficient, demonstrating the value of UniCal for scalable calibration
Video
Motivation
Robots such as self-driving vehicles (or SDVs) require accurate calibration of LiDARs and cameras to fuse sensor data accurately such as for autonomy or 3D reconstruction. Even a slight shift in the sensor pose estimation may result in several meters of misalignment between observations for distant objects, which could cause catastrophic failure. The following figure shows the difference for LiDAR-Camera and LiDAR-LiDAR alignment between uncalibrated and well-calibrated extrinsics.
Traditionally, sensor calibration involves collecting sensor data in a controlled indoor environment, with fiducials such as checkerboards mounted to different locations or held by operators. The SDV must observe these fiducials from various viewpoints and distances, and then leverage extracted geometric features to compute correspondences and estimate the relative poses between different sensors. For multiple sensors, this process is followed by a global optimization stage. The following video demonstrates an operator moving checkerboards to generate diverse feature correspondences for calibrating a single LiDAR-camera pair using a classical method.
This is an arduous process that requires substantial infrastructure, significant operation costs, and manual effort. To address these challenges, we propose UniCal, an automatic, targetless and unified multi-sensor calibration method for SDVs. With UniCal, one can just drive the SDV in an outdoor area, record the sensor data, and then run the algorithm to get the extrinsics between all the sensors. This “drive-and-calibrate” approach significantly reduces the cost and operational overhead, enabling more scalable fleet calibration. UniCal leverages recent advances in neural rendering to reconstruct sensor data to obtain high quality calibration. As we reconstruct a digital twin of the real world, we jointly optimize the calibration of the sensors to ensure renderings of the reconstruction match the real observations. We enhance neural rendering specifically for multi-sensor calibration by incorporating a novel surface-guided alignment loss and a coarse-to-fine sampling strategy driven by robust feature correspondences.
Method
We now review UniCal’s methodology. First, we construct a sensor calibration graph linking all the LiDARs and cameras on the vehicle and initialize an implicit scene representation for the area we drove in. We jointly optimize the multi-sensor extrinsics and underlying scene representation within a differentiable framework to minimize the photometric and geometric consistency losses on the collected outdoor data retrospectively. Through the joint optimization of sensor parameters and the underlying scene representation, we effectively resolve the relative poses between sensors.
Recovering both the scene representation and sensor poses in unstructured outdoor driving scenes can be challenging. An unregularized model can learn to render the target observations with incorrect poses and geometry. To mitigate this, we want to ensure that the 3D structures inferred from the sensor data align with the underlying implicit scene surface. For LiDAR data, this geometric alignment can be assessed by comparing the rendered depth with the observed depth. For camera data, we introduce a differentiable surface alignment distance to impose additional geometric constraints on the camera poses. Specifically, we infer sparse correspondences between camera image pairs using off-the-shelf multi-view geometry tools. Then we ray-cast corresponding pixels u1 and u2 onto the implicit surface yields 3D points p1 and p1. The surface alignment distance quantifies the image-space discrepancy between p1 and p1, and minimizing it ensures geometric consistency across sensors or perspectives.
Selecting which sensor rays to render and optimize with is an important design choice for learning sensor calibration. Typical structure-from-motion pipelines identify interest points to establish correspondences for alignment. In contrast, the neural rendering literature typically leverages uniform ray sampling for scene reconstruction. Our approach takes the best of both by employing a coarse-to-fine sampling strategy during training. Initially, we uniformly sample sensor rays to learn an accurate scene representation. However, not all sensor rays contribute equally to pose learning — textureless regions, like the sky and road, offer insufficient gradients to effectively update the sensor poses. Therefore, we progressively increase sampling frequency in regions of interest to enhance pose registration. We identify interest points using an off-the-shelf keypoints detector and create a corresponding heat map.
Calibration on Large Outdoor Scene
We now show results and demonstrate UniCal is effective on multiple sensor platforms and datasets.
PandaSet: Minivan Platform
We first show results on the publicly available PandaSet with a minivan platform, with two lidars and six cameras. Here we show our method optimizing the calibration for all the sensor pairs and visualize the lidar-camera projection on a held out data collect, with the lidar colored by depth. While the initial calibration is quite poor, we can recover and converge to a good calibration quickly.
Here are the calibration results on another log snippet.
MS-Cal: Class 8 Truck Platform
We also collected a dataset with a class-8 truck with five lidars and 8 cameras, and we show results on held-out snippets for this dataset. Our method is also able to quickly achieve a high quality calibration.
Here is the calibration results on another log snippet.
Calibration Comparison
UniCal produces accurate calibration and does not require specific calibration targets, allowing for scalable fleet calibration. We now compare it to both classical and neural rendering-based methods.
Outdoor LiDAR-Camera Re-Projection Comparison
We first show the LiDAR-Camera Re-Projection comparison on our collected outdoor truck data. Our method has high quality alignment with the thin and far-away structures.
Outdoor LiDAR-LiDAR Registration Comparison
For LiDAR-LiDAR calibration, UniCal ensures all point clouds align for surfaces such as the ground and curb, which is key for scene understanding.
Indoor Checkerboard Re-projection Comparison
We also collect checkerboard data using our truck platform for evaluating the calibration performance quantitatively. UniCal achieves high quality alignment without needing any targets, and outperforms both classical and neural rendering approaches.
Rendering Quality Comparison
We now demonstrate that our learned calibration also improves performance for downstream tasks such as scene reconstruction and rendering. When we use the calibration results to train a neural rendering model on a held out scene, UniCal’s generated calibration achieves the highest quality results and recovers fine details.
Conclusion
In this paper, we propose UniCal, a unified framework that takes the collected data from multi-sensor platforms and automatically calibrates the sensor extrinsics. Our method combines feature-based registration with neural rendering for accurate and efficient calibration without the need for calibration targets. This “drive-and-calibrate” approach significantly reduces costs and operational overhead compared to existing calibration systems that employ extensive infrastructures and procedures, thereby facilitating scalable calibration for large SDV fleets. Our method can also be combined with initial classical calibration approaches to further improve robustness.
BibTeX
@inproceedings{yang2024unical,
title = {UniCal: Unified Neural Sensor Calibration},
author = {Ze Yang and George Chen and Haowei Zhang and Kevin Ta and Ioan Andrei Bârsan and Daniel Murphy and Sivabalan Manivasagam and Raquel Urtasun},
booktitle = {European Conference on Computer Vision},
year = {2024},
}