
DIO: Decomposable Implicit 4D Occupancy-Flow World Model
Christopher Diehl*, Quinlan Sykora*, Ben Agro, Thomas Gilles, Sergio Casas, Raquel Urtasun
ICRA 2024
Jack Lu†*, Kelvin Wong*, Chris Zhang, Simon Suo, Raquel Urtasun
Simulation is an essential tool to safely and scalably develop self-driving vehicles. A core component of simulation is the ability to simulate traffic scenarios. This is typically decomposed into two tasks: (1) specifying the initial placement and attributes for the actors in the scene; and (2) simulating those actors’ behaviors. We focus on the first task, which we call traffic scene generation.
A common method is to manually create traffic scenes. This approach gives us granular control to create scenes with specific interactions, but it is far too tedious to do this at scale. Rules-based generation can automatically generate variations of scenes at scale. However, it is hard to design good rules, and their rigidity often limits realism and diversity. Recent works learn to generate diverse traffic scenes directly from data. However, existing models produce scenes that often defy common sense; for example, scenes with collisions. They also lack controllability, which limits their usefulness in practice. We’re interested in a solution that is scalable, realistic, and controllable.
We propose SceneControl , a framework for controllable traffic scene generation . In our approach, we first train a diffusion model of traffic scenes from real traffic data, which learns to iteratively denoise random noise into realistic traffic scenes. Then, to control the generation process, we encode arbitrary high-level constraints into guidance functions and use guided sampling to sample from a perturbed distribution that captures realism and constraint-satisfaction simultaneously. Sampling from this perturbed distribution corresponds to generating scenes that are both realistic under our diffusion model and constraint-satisfying under the guidance functions. Notably, this formulation decouples realism from controllability, allowing us to re-use the same diffusion model with various guidance functions without re-training.
By varying the guidance function, we can flexibly encode different constraints into the generation process. For example, using the identity recovers unconditional scene generation whereas using a collision cost encourages collision-free scenes instead.
Spatial region constraints encourages new actors to spawn in specific polygonal regions.
Actor attribute constraints encourages new actors to have specific ranges of attributes (e.g., speed).
Initial scene constraints encourages the scene to preserve a set of existing actors.
Collision constraints encourages the generation of collision-free scenes.
On-road constraints encourages new actors to be placed on the road.
Given the HD map of an empty scene, SceneControl can automatically generate realistic traffic scenes from scratch. Here, we depict SceneControl’s denoising process, which gradually refines random noise into a realistic scene.
We can do this at scale across a variety of diverse road topologies and traffic conditions. Here, we show traffic scenes that SceneControl generated in complex urban maps and high-speed highway traffic. Newly generated actors are shown in blue.
Using SceneControl, we can build an interactive tool for controllable scene generation. Starting from a real scene, we can easily remove actors and generate variations. Existing actors are shown in grey and left unmodified. SceneControl realistically and automatically insert new actors into the scene.
We can also use SceneControl to densify specific regions of a scene. The user simply draws a polygon on the map and SceneControl will place actors into the polygon realistically.
Finally, we can combine multiple constraints to generate complex variations of an existing scene. For example, here, we use SceneControl to insert large, low-speed actors into the specified polygon.
@inproceedings{scenecontrol2024,
title = {SceneControl: Diffusion for Controllable Traffic Scene Generation},
author = {Jack Lu and Kelvin Wong and Chris Zhang and Simon Suo and Raquel Urtasun},
booktitle = {IEEE International Conference on Robotics and Automation (ICRA)},
year = {2024},
}
Christopher Diehl*, Quinlan Sykora*, Ben Agro, Thomas Gilles, Sergio Casas, Raquel Urtasun
Ben Agro, Sergio Casas, Patrick Wang, Thomas Gilles, Raquel Urtasun
Ze Yang, Jingkang Wang, Haowei Zhang, Sivabalan Manivasagam, Yun Chen, Raquel Urtasun
Yun Chen*, Matthew Haines*十, Jingkang Wang, Krzysztof Baron-Lis, Sivabalan Manivasagam, Ze Yang, Raquel Urtasun
UniCal: Unified Neural Sensor Calibration
Chris Zhang, Sourav Biswas, Kelvin Wong, Kion Fallah, Lunjun Zhang, Dian Chen, Sergio Casas, Raquel Urtasun
Yun Chen*, Jingkang Wang*, Ze Yang, Sivabalan Manivasagam, Raquel Urtasun
Sergio Casas*, Ben Agro*, Jiageng Mao*十, Thomas Gilles, Alexander Cui十, Thomas Li, Raquel Urtasun
Sergio Casas*, Ben Agro*, Jiageng Mao*十, Thomas Gilles, Alexander Cui十, Thomas Li, Raquel Urtasun