MixSim: A Hierarchical Framework for Mixed Reality Traffic Simulation

CVPR 2023

Simon Suo,* Kelvin Wong,* Justin Xu, James Tu, Alexander Cui, Sergio Casas, Raquel Urtasun


The prevailing way to test a self-driving vehicle (SDV) in simulation involves non-reactive open-loop replay of real world scenarios. However, in order to safely deploy SDVs to the real world, we need to evaluate them in closed-loop. Towards this goal, we propose to leverage the wealth of interesting scenarios captured in the real world and make them reactive and controllable to enable closed-loop SDV evaluation in what-if situations. In particular, we present MixSim, a hierarchical framework for mixed reality traffic simulation. MixSim explicitly models agent goals as routes along the road network and learns a reactive route-conditional policy. By inferring each agent’s route from the original scenario, MixSim can reactively re-simulate the scenario and enable testing different autonomy systems under the same conditions. Furthermore, by varying each agent’s route, we can expand the scope of testing to what-if situations with realistic variations in agent behaviors or even safety critical interactions. Our experiments show that MixSim can serve as a realistic, reactive, and controllable digital twin of real world scenarios.

Overview

MixSim is a hierarchical framework for mixed reality traffic simulation. Given a real world scenario, MixSim builds a reactive and controllable digital twin of how its traffic agents behave. This enables us to re-simulate the original scenario and answer what-if questions like: What if the SDV lane changes? What if the agent cuts in front of the SDV?


Video Overview

0:000:00

Method

  • Reactive re-simulation: by inferring each agent's reference route from its original trajectory

  • Sampling realistic variations: by sampling routes from a learned routing policy

  • Finding safety critical variations: by finding routes that stress the autonomy system


Reactive Re-simulation

We compare non-reactive replay on the left to MixSim on the right. In each example, the pink agent is controlled by an autonomy stack. The grey agents replay their ground truth trajectories and blue agents are controlled by MixSim to follow their ground truth routes. MixSim agents reconstruct their original high-level behaviors with high fidelity. At the same time, MixSim agents also react realistically to changes to the SDV’s behaviors; e.g., by braking when the SDV brakes. In contrast, the replay agents are non-reactive and cause unrealistic collisions when the SDV deviates from its original trajectory.

0:000:00

We also compare MixSim to three path-following baselines that are representative of the state-of-the-art in traffic simulation. Overall, MixSim simulates more realistic traffic behaviors that are less prone to unrealistic collisions.

0:000:00

Sampling Realistic Variations

We show a mosaic of realistic variations generated using MixSim. As before, the SDV is shown in pink; grey agents simply replay their ground truth trajectories; and blue agents are controlled by MixSim. By varying the controlled agents’ desired routes, MixSim generates realistic variations of the original scenario with visibly diverse behaviors.

0:000:00

Finding Safety Critical Variations

We show a side-by-side comparison of safety-critical variations found using two methods. As before, the SDV is shown in pink; reactive agents are shown in blue; but now, we have an additional adversarial agent shown in orange. On the left, we show a popular approach that simply perturbs the adversarial agent’s trajectory to cause a collision. On the right, the adversarial agent is controlled by MixSim instead. Specifically, we use black box optimization to find a route that, when given to MixSim, causes a collision. Compared to the baseline, MixSim finds far more realistic safety critical scenarios by encoding realism via a learned policy. In contrast, the baseline considers kinematic realism only, leading to more unrealistic collisions.

0:000:00

BibTeX

@inproceedings{mixsim2023,
  title     = {MixSim: A Hierarchical Framework for Mixed Reality Traffic Simulation},
  author    = {Simon Suo and Kelvin Wong and Justin Xu and James Tu and Alexander Cui and Sergio Casas and Raquel Urtasun},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year      = {2023},
}
Other Research
CVPR 2025

DIO: Decomposable Implicit 4D Occupancy-Flow World Model

Christopher Diehl*, Quinlan Sykora*, Ben Agro, Thomas Gilles, Sergio Casas, Raquel Urtasun

CVPR 2025

MAD: Memory-Augmented Detection of 3D Objects

Ben Agro, Sergio Casas, Patrick Wang, Thomas Gilles, Raquel Urtasun

CVPR 2025

GenAssets: Generating in-the-wild 3D Assets in Latent Space

Ze Yang, Jingkang Wang, Haowei Zhang, Sivabalan Manivasagam, Yun Chen, Raquel Urtasun

SaLF: Sparse Local Fields for Multi-Sensor Rendering in Real-Time

Yun Chen*, Matthew Haines*十, Jingkang Wang, Krzysztof Baron-Lis, Sivabalan Manivasagam, Ze Yang, Raquel Urtasun

ECCV 2024

UniCal: Unified Neural Sensor Calibration

Ze Yang, George Chen, Haowei Zhang, Kevin Ta, Ioan Andrei Bârsan, Daniel Murphy, Sivabalan Manivasagam, and Raquel Urtasun

ECCV 2024

Learning to Drive via Asymmetric Self-Play

Chris Zhang, Sourav Biswas, Kelvin Wong, Kion Fallah, Lunjun Zhang, Dian Chen, Sergio Casas, Raquel Urtasun

ECCV 2024

G3R: Gradient Guided Generalizable Reconstruction

Yun Chen*, Jingkang Wang*, Ze Yang, Sivabalan Manivasagam, Raquel Urtasun

CVPR 2024 (Oral)

UnO: Unsupervised Occupancy Fields for Perception and Forecasting

Sergio Casas*, Ben Agro*, Jiageng Mao*十, Thomas Gilles, Alexander Cui十, Thomas Li, Raquel Urtasun

ECCV 2024

DeTra: A Unified Model for Object Detection and Trajectory Forecasting

Sergio Casas*, Ben Agro*, Jiageng Mao*十, Thomas Gilles, Alexander Cui十, Thomas Li, Raquel Urtasun

ICRA 2024

SceneControl: Diffusion for Controllable Traffic Scene Generation

Jack Lu†*, Kelvin Wong*, Chris Zhang, Simon Suo, Raquel Urtasun