Rethinking Closed-loop Training for Autonomous Vehicles
Abstract
Overview
Video
Trajectory Value Learning (TRAVL)
TRAVL learns to reason in trajectory space, which allows for better long-term planning and more efficient learning.
While typical approaches directly output an instantaneous control command, we learn to predict both the immediate and long-term value of following a longer horizon trajectory. At inference, we sample a set of candidate trajectories and select the best scoring trajectory, which is executed for a short period of time before replanning.
Reasoning in trajectory space allows for efficient training with RL along with an additional model-based loss using imagined counterfactual data. We use Q-learning to supervise the long-horizon value of an executed trajectory. An approximate world model provides short-horizon supervision for counterfactual trajectories for more efficient learning.
Experiment Results
We compare methods on our closed-loop scenario benchmark set by evaluating various safety and performance metrics: success rate, collision rate, progress, minimum time-to-collision and minimum distance-to-closest-actor.
TRAVL outperforms control (C) and trajectory (T) baselines. We hypothesize that trajectory-based methods outperform their control-based counterparts due to improved long-horizon reasoning, and the additional counterfactual supervision in TRAVL allows it to outperform all baselines.
Closed-loop Benchmark Design
In this work we study how to best build training benchmarks for effective closed-loop training of autonomous vehicles.
Using our Waabi World simulator, we can create both realistic free-flow and targeted scenarios. Free-flow scenarios are similar to what we observe in real-world data, where actors follow general traffic models and scenarios are generated by sampling parameters such as density and actor speed. Targeted scenarios on the other hand are generated by enacting fine-grained control on the actors to target specific traffic situations like actor cut-ins.
Targeted vs Free-flow Scenarios
We find that for both TRAVL and the next strongest baseline Rainbow + Trajectory (RB+T), training on targeted scenarios performs better than training on free-flow scenarios, even when evaluated on free-flow scenarios. This suggests that because targeted scenarios are designed to contain more interaction, they better capture a basis of ubiquitous driving skills.
Behavioral Scale and Diversity
We find that increasing scenario diversity and scale is crucial for improving safety metrics for many models – we see metrics improve as we use more and more (Percent of Train Data) of our available scenarios for training.
In particular, behavioral variation in our scenarios is important. Previous approaches typically rely on map variation (i.e., geolocation) primarily, which we find to be less effective.
Qualitative Results
Free flow: In this first example, we see TRAVL driving in a free-flow scenario where it must merge onto the highway. We see TRAVL can execute the merge smoothly.
Actor cut-in: Next is a targeted scenario which tests a model’s ability to handle an actor cutting in. We see TRAVL reacts quickly and brakes accordingly.
Lane change: In this targeted scenario, the task is to lane change between the two actors. We see TRAVL has learned to slow down and make the lane change.
Merge: Finally, this targeted scenario initiates the agent at a very slow speed before a merge. We see TRAVL has learned to speed up to match the traffic flow before merging.
Conclusion
We have studied how to design traffic scenarios and scale training environments in order to create an effective closed-loop benchmark for autonomous driving. We have proposed a new method to efficiently learn driving policies which can perform long-term reasoning and planning. Our method reasons in trajectory space and can efficiently learn in closed-loop by leveraging additional imagined experiences. We provide theoretical analysis in the full paper and empirically demonstrate the advantages of our method over the baselines on our new benchmark.
BibTeX
@inproceedings{zhang2022rethinking,
title = {Rethinking Closed-loop Training for Autonomous Driving},
author = {Zhang, Chris and Guo, Runsheng and Zeng, Wenyuan and Xiong, Yuwen and Dai, Binbin and Hu, Rui and Ren, Mengye and Urtasun, Raquel},
booktitle = {Computer Vision--ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23--27, 2022, Proceedings, Part XXXIX},
pages = {264--282},
year = {2022},
organization = {Springer}
}