
DIO: Decomposable Implicit 4D Occupancy-Flow World Model
Christopher Diehl*, Quinlan Sykora*, Ben Agro, Thomas Gilles, Sergio Casas, Raquel Urtasun
CoRL 2023
James Tu, Simon Suo, Chris Zhang, Kelvin Wong, Raquel Urtasun
This paper tackles the problem of parameterized scenario testing for autonomous vehicles (AVs). Every point in the parameter space corresponds to a concrete scenario where the AV will either pass or fail according to safety requirements. The goal of testing is to understand the AV's performance across the parameter space.
Autonomous vehicles (AVs) can revolutionalize the way we live by drastically reducing accidents, relieving traffic congestion, and providing mobility for those who cannot drive. In order to realize this future, developers must first ask the question: "Is the AV safe enough to be deployed in the real world?" To answer this question, we must understand in which scenarios the AV can meet safety requirements. Towards this goal, it's important to build a testing framework which covers the wide range of scenarios in the AV's operational domain, and further identify if the AV is safe or unsafe in these scenarios.
To cover the wide range of real-world scenarios in a scalable manner, the self-driving industry often relies on simulation where the traffic environment is fully controllable and long-tail events can be synthesized. A popular approach to describing real world events in simulation is through Logical Scenarios . Each Logical Scenario is designed to follow a high-level description (e.g. SDV follows its lane, actor cuts in), and exposes configurable parameters that detail the low-level characteristics of the scenario (e.g. velocity of actors, road curvature). Then, a specific combination of parameter values corresponds to a Concrete Scenario which can be executed in simulation to determine if the AV complies with functional safety requirements. This is typically captured as a binary pass or fail determined by regulatory demand. For example, an AV could fail if it violates a safety distance threshold.
We cannot directly test every single parameter combination, since there could be infinitely many possibilities when the parameters are continuous. Instead testing involves executing a finite set of concrete scenarios and estimating if the AV will pass or fail on unseen tests. In our testing framework GUARD, we use a Gaussian Process (GP) to leverage executed tests and estimate the probability of passing or failing across the parameter space. Then, the parameter space can be partitioned into pass, fail, and unknown regions using a probability threshold.
We efficiently sample concrete tests according to two criteria. Intiutively, the samples on the boundary of passing and failing will be the most informative towards partitioning the parameter space. On the other hand, it's also beneficial to sample tests where the GP is uncertain about the outcome. In the testing process, we repeatedly sample a test according to these criteria, observe the performance metric, and update the GP.
GPs model the relationships between similar tests using kernels . Selecting good kernel hyperparameters is crucial for the GP to accurately model the pass/fail landscape. The hyperparmeter tuning process is typically tedious and unscalable to do across every different logical scenario. We make GUARD scalable by automatically learning the kernel parameters that best fits the observed data.
Compared to existing approaches, GUARD is able to achieve higher test coverage and more accurately estimate if the AV will pass or fail. In the plots below we evaluate 4 metrics:
Coverage measures the percent of the parameter space that was covered.
Balanced Accuracy measures the accuracy of the pass/fail predictions the testing framework makes across the space that is covered.
Error Recall measures the percentage of failures across the space the framework is able to discover. Identifying these failures is especially for autonomy devleopment.
False Positive Rate measures the percentage of predicted passes that are actually fails. It's crucial to keep this low as failing to catch failures can have severe consequences.
To visually demonstrate how GUARD is able to achieve superior testing performance, we visualize the parameter space. Since a high-dimensional space is difficult to visualize, we select a 2D slice. On this slice we show the ground truth pass and fail regions, as well as those predicted by different testing frameworks. GUARD is able to more accurately model the pass/fail regions since it does not discretize the parameter space.
In practice, GUARD is also a useful tool for benchmarking different iterations of the AV. Between two versions of the AV, GUARD can identify which one is less likely to violate a safety requirement. Furthermore, we can triage specific instances of regressions. Here we show the pass/fail landscape of the two version of the autonomy and highlight the region in the parameter space where the AV went from passing to failing.
We propose an efficient and scalable framework for coverage-based testing. Our approach is able to achieve higher test coverage and evaluation accuracy compared to other common approaches in the industry. In practice GUARD can serve as a valuable tool for benchmarking different versions of the AV and identify specific cases of regression. This framework can be used in practice with functional safety experts defining a comprehensive set of safety requirements and a parameterized operational design domain (ODD). Our work is ultimate a step towards safely deploying AVs at scale.
@inproceedings{tu2023towards,
title = {Towards Scalable Coverage-Based Testing of Autonomous Vehicles},
author = {James Tu and Simon Suo and Chris Zhang and Kelvin Wong and Raquel Urtasun},
booktitle = {Conference on Robot Learning (CoRL)},
year = {2023},
}
Christopher Diehl*, Quinlan Sykora*, Ben Agro, Thomas Gilles, Sergio Casas, Raquel Urtasun
Ben Agro, Sergio Casas, Patrick Wang, Thomas Gilles, Raquel Urtasun
Ze Yang, Jingkang Wang, Haowei Zhang, Sivabalan Manivasagam, Yun Chen, Raquel Urtasun
Yun Chen*, Matthew Haines*十, Jingkang Wang, Krzysztof Baron-Lis, Sivabalan Manivasagam, Ze Yang, Raquel Urtasun
UniCal: Unified Neural Sensor Calibration
Chris Zhang, Sourav Biswas, Kelvin Wong, Kion Fallah, Lunjun Zhang, Dian Chen, Sergio Casas, Raquel Urtasun
Yun Chen*, Jingkang Wang*, Ze Yang, Sivabalan Manivasagam, Raquel Urtasun
Sergio Casas*, Ben Agro*, Jiageng Mao*十, Thomas Gilles, Alexander Cui十, Thomas Li, Raquel Urtasun
Sergio Casas*, Ben Agro*, Jiageng Mao*十, Thomas Gilles, Alexander Cui十, Thomas Li, Raquel Urtasun
Jack Lu†*, Kelvin Wong*, Chris Zhang, Simon Suo, Raquel Urtasun