SceneControl: Diffusion for controllable traffic scene generation

ICRA 2024

Jack Lu†*, Kelvin Wong*, Chris Zhang, Simon Suo, Raquel Urtasun

by Waabi

We consider the task of traffic scene generation. A common approach in the self-driving industry is to use manual creation to generate scenes with specific characteristics and automatic generation to generate canonical scenes at scale. However, manual creation is not scalable, and automatic genration typically use rules-based algorithms that lack realism. In this paper, we propose SceneControl, a framework for controllable traffic scene generation. To capture the complexity of real traffic, SceneControl learns an expressive diffusion model from data. Then, using guided sampling, we can flexibly control the sampling process to generate scenes that exhibit desired characteristics. Our experiments show that SceneControl achieves greater realism and controllability than the existing state-of-the-art. We also illustrate how SceneControl can be used as a tool for interactive traffic scene generation.

Motivation

Simulation is an essential tool to safely and scalably develop self-driving vehicles. A core component of simulation is the ability to simulate traffic scenarios. This is typically decomposed into two tasks: (1) specifying the initial placement and attributes for the actors in the scene; and (2) simulating those actors’ behaviors. We focus on the first task, which we call traffic scene generation.

A common method is to manually create traffic scenes. This approach gives us granular control to create scenes with specific interactions, but it is far too tedious to do this at scale. Rules-based generation can automatically generate variations of scenes at scale. However, it is hard to design good rules, and their rigidity often limits realism and diversity. Recent works learn to generate diverse traffic scenes directly from data. However, existing models produce scenes that often defy common sense; for example, scenes with collisions. They also lack controllability, which limits their usefulness in practice. We’re interested in a solution that is scalable, realistic, and controllable.

Video

Method

We propose SceneControl , a framework for controllable traffic scene generation . In our approach, we first train a diffusion model of traffic scenes from real traffic data, which learns to iteratively denoise random noise into realistic traffic scenes. Then, to control the generation process, we encode arbitrary high-level constraints into guidance functions and use guided sampling to sample from a perturbed distribution that captures realism and constraint-satisfaction simultaneously. Sampling from this perturbed distribution corresponds to generating scenes that are both realistic under our diffusion model and constraint-satisfying under the guidance functions. Notably, this formulation decouples realism from controllability, allowing us to re-use the same diffusion model with various guidance functions without re-training.

By varying the guidance function, we can flexibly encode different constraints into the generation process. For example, using the identity recovers unconditional scene generation whereas using a collision cost encourages collision-free scenes instead.

Spatial region constraints encourages new actors to spawn in specific polygonal regions.
Actor attribute constraints encourages new actors to have specific ranges of attributes (e.g., speed).
Initial scene constraints encourages the scene to preserve a set of existing actors.
Collision constraints encourages the generation of collision-free scenes.
On-road constraints encourages new actors to be placed on the road.

Automatic Generation at Scale

Given the HD map of an empty scene, SceneControl can automatically generate realistic traffic scenes from scratch. Here, we depict SceneControl’s denoising process, which gradually refines random noise into a realistic scene.

We can do this at scale across a variety of diverse road topologies and traffic conditions. Here, we show traffic scenes that SceneControl generated in complex urban maps and high-speed highway traffic. Newly generated actors are shown in blue.

Controllable Traffic Scene Generation

Using SceneControl, we can build an interactive tool for controllable scene generation. Starting from a real scene, we can easily remove actors and generate variations. Existing actors are shown in grey and left unmodified. SceneControl realistically and automatically insert new actors into the scene.

We can also use SceneControl to densify specific regions of a scene. The user simply draws a polygon on the map and SceneControl will place actors into the polygon realistically.

Finally, we can combine multiple constraints to generate complex variations of an existing scene. For example, here, we use SceneControl to insert large, low-speed actors into the specified polygon.

BibTeX

@inproceedings{scenecontrol2024,
  title     = {SceneControl: Diffusion for Controllable Traffic Scene Generation},
  author    = {Jack Lu and Kelvin Wong and Chris Zhang and Simon Suo and Raquel Urtasun},
  booktitle = {IEEE International Conference on Robotics and Automation (ICRA)},
  year      = {2024},
}