Skip to main content
SimulationTraffic Modelling

Learning Realistic Traffic Agents in Closed-loop

By October 27, 2023November 3rd, 2023No Comments

Learning Realistic Traffic Agents in Closed-loop

Chris Zhang, James Tu, Lunjun Zhang, Kelvin Wong, Simon Suo, Raquel Urtasun
Conference: CoRL 2023


Realistic traffic simulation is crucial for developing self-driving software in a safe and scalable manner prior to real-world deployment. Typically, imitation learning (IL) is used to learn human-like traffic agents directly from real-world observations collected offline, but without explicit specification of traffic rules, agents trained from IL alone frequently display unrealistic infractions like collisions and driving off the road. This problem is exacerbated in out-of-distribution and long-tail scenarios. On the other hand, reinforcement learning (RL) can train traffic agents to avoid infractions, but using RL alone results in unhuman-like driving behaviors. We propose Reinforcing Traffic Rules (RTR), a holistic closed-loop learning objective to match expert demonstrations under a traffic compliance constraint, which naturally gives rise to a joint IL + RL approach, obtaining the best of both worlds. Our method learns in closed-loop simulations of both nominal scenarios from real-world datasets as well as procedurally generated long-tail scenarios. Our experiments show that RTR learns more realistic and generalizable traffic simulation policies, achieving significantly better tradeoffs between human-like driving and traffic compliance in both nominal and long-tail scenarios. Moreover, when used as a data generation tool for training prediction models, our learned traffic policy leads to considerably improved downstream prediction metrics compared to baseline traffic agents.


We tackle the problem of realistic actor simulation. Our approach uses both nominal human driving logs and simulated long-tail scenarios in a closed-loop environment. We learn to match the expert while avoiding infractions through a combination of imitation learning and reinforcement learning.



Developing self-driving in simulation can be safer and more scalable than driving purely in the real world. In this work, we want to learn models of how humans drive in order to use them as realistic actors in simulation. In order to be realistic, models must 1) capture the nuances of human-like driving and 2) avoid infractions like collisions or driving off the road. While these may not seem contradictory at first glance, existing approaches have shortcomings resulting in less robust policies that exhibit a trade-off between the two. Our work seeks to address some of these shortcomings to improve the trade-off exhibited, and advance the Pareto frontier.


Our approach involves learning with a unified closed-loop objective to match expert demonstrations under a traffic compliance constraint, while using both nominal offline data and additional simulated long-tail scenarios.

Nominal driving logs contain expert demonstrations of humans driving in the real world, but the scenarios themselves are often repetitive. For a richer learning environment, we supplement them with additional simulated long-tail scenarios that contain a hero actor that induces interesting interactions.

Nominal highway log

Long-tail actor cut-in scenario

Open-loop methods like behavior cloning are known to suffer from compounding error and distribution shift at test time. Instead, we use closed-loop training to learn a more robust policy. Our closed-loop objective naturally gives rise to a joint imitation learning and reinforcement learning approach, where we imitate expert demonstrations when available, and always aim to avoid infractions.

Qualitative Comparisons

Nominal Scenarios:

We compare RTR (right) to the baseline IL (left) approach below. Note that all actors are controlled by the learned model. We draw the viewers attention to actors highlighted in pink.

IL: Collision during fork

IL: Collision during merge

RTR (ours): Collision-free overtake

RTR (ours): Safer driving during merge

Long-tail Scenarios:

We can also evaluate models on unseen long-tail scenarios. Here, blue actors are hero actors that are scripted to induce a long-tail interaction.

IL: Unreactive to merging actor

IL: Unreactive to cut-in actor

RTR (ours): Yield to merging actor

RTR (ours): Defensive swerving to avoid collision

Realism vs. Infraction avoidance

Our quantitative evaluation showed that RTR learns to avoid infractions while still capturing human-like driving. We plot collision rate compared to other measures of realism like reconstruction and distributional metrics. Several baselines varying between pure IL, pure RL, and some combination of IL+RL are shown; RTR outperforms the previous shaded Pareto frontier.

We show a histogram of acceleration values taken by the actor policies. We see that RTR learns to avoid infractions in a human-like fashion, unlike baselines such as pure RL, which learns more unrealistic behavior like slowing down too often.

Downstream evaluation

To further evaluate actor model realism, we consider the downstream task of training prediction models on actor simulated data. We see that prediction models trained on RTR-simulated data achieve the best metrics on held out real data, suggesting that RTR simulations are more realistic and have a lower domain gap compared to baselines.


We have presented RTR, a method for learning realistic traffic agents with closed-loop IL+RL using both real-world logs and procedurally generated long-tail scenarios. When compared to a range of competitive baselines, RTR obtains significantly better tradeoffs between realism and infraction-avoidance. Additionally, considerably improved downstream prediction metrics are obtained when using RTR-simulated data for training, suggesting that RTR simulations are more realistic. We believe this serves as a crucial step towards more effective applications of traffic simulation for self-driving.


  title     = {Learning Realistic Traffic Agents in Closed-loop},
  author    = {Chris Zhang and James Tu and Lunjun Zhang and Kelvin Wong and Simon Suo and Raquel Urtasun},
  booktitle = {7th Annual Conference on Robot Learning},
  year      = {2023},
  url       = {}