Skip to main content
SimulationTraffic Modelling

MixSim: A Hierarchical Framework for Mixed Reality Traffic Simulation

By June 4, 2023July 25th, 2023No Comments

MixSim: A Hierarchical Framework for Mixed Reality Traffic Simulation

Simon Suo,*  Kelvin Wong,*  Justin XuJames Tu,   Alexander Cui,  Sergio Casas,  Raquel Urtasun
* denotes equal contribution
Conference: CVPR 2023


The prevailing way to test a self-driving vehicle (SDV) in simulation involves non-reactive open-loop replay of real world scenarios. However, in order to safely deploy SDVs to the real world, we need to evaluate them in closed-loop. Towards this goal, we propose to leverage the wealth of interesting scenarios captured in the real world and make them reactive and controllable to enable closed-loop SDV evaluation in what-if situations. In particular, we present MixSim, a hierarchical framework for mixed reality traffic simulation. MixSim explicitly models agent goals as routes along the road network and learns a reactive route-conditional policy. By inferring each agent’s route from the original scenario, MixSim can reactively re-simulate the scenario and enable testing different autonomy systems under the same conditions. Furthermore, by varying each agent’s route, we can expand the scope of testing to what-if situations with realistic variations in agent behaviors or even safety critical interactions. Our experiments show that MixSim can serve as a realistic, reactive, and controllable digital twin of real world scenarios.


MixSim is a hierarchical framework for mixed reality traffic simulation. Given a real world scenario, MixSim builds a reactive and controllable digital twin of how its traffic agents behave. This enables us to re-simulate the original scenario and answer what-if questions like: What if the SDV lane changes? What if the agent cuts in front of the SDV?

Video Overview


MixSim is a hierarchical framework for mixed reality traffic simulation. Given a reference scenario, our goal is to build a reactive and controllable digital twin that allows us to re-simulate the scenario and explore what-if variations. The digital twin should preserve the high-level behaviors and interactions of the original scenario (e.g., taking an off-ramp) but not the specific trajectories themselves (e.g., braking to avoid a collision). This motivates learning a hierarchical goal-directed policy to model each agent’s behavior.

In our approach, we represent each agent’s goal as a route along the road network. By varying each agent’s route, we can realistically re-simulate a real world scenario in what-if situations:

  • Reactive re-simulation: by inferring each agent’s reference route from its original trajectory
  • Sampling realistic variations: by sampling routes from a learned routing policy
  • Finding safety critical variations: by finding routes that stress the autonomy system

Reactive Re-simulation

We compare non-reactive replay on the left to MixSim on the right. In each example, the pink agent is controlled by an autonomy stack. The grey agents replay their ground truth trajectories and blue agents are controlled by MixSim to follow their ground truth routes.

MixSim agents reconstruct their original high-level behaviors with high fidelity. At the same time, MixSim agents also react realistically to changes to the SDV’s behaviors; e.g., by braking when the SDV brakes. In contrast, the replay agents are non-reactive and cause unrealistic collisions when the SDV deviates from its original trajectory.

We also compare MixSim to three path-following baselines that are representative of the state-of-the-art in traffic simulation. Overall, MixSim simulates more realistic traffic behaviors that are less prone to unrealistic collisions.

Sampling Realistic Variations

We show a mosaic of realistic variations generated using MixSim. As before, the SDV is shown in pink; grey agents simply replay their ground truth trajectories; and blue agents are controlled by MixSim. By varying the controlled agents’ desired routes, MixSim generates realistic variations of the original scenario with visibly diverse behaviors.

Finding Safety Critical Variations

We show a side-by-side comparison of safety-critical variations found using two methods. As before, the SDV is shown in pink; reactive agents are shown in blue; but now, we have an additional adversarial agent shown in orange. On the left, we show a popular approach that simply perturbs the adversarial agent’s trajectory to cause a collision. On the right, the adversarial agent is controlled by MixSim instead.

Specifically, we use black box optimization to find a route that, when given to MixSim, causes a collision. Compared to the baseline, MixSim finds far more realistic safety critical scenarios by encoding realism via a learned policy. In contrast, the baseline considers kinematic realism only, leading to more unrealistic collisions.


  title     = {MixSim: A Hierarchical Framework for Mixed Reality Traffic Simulation},
  author    = {Simon Suo and Kelvin Wong and Justin Xu and James Tu and Alexander Cui and Sergio Casas and Raquel Urtasun},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year      = {2023},