QuAD: Query-based Interpretable Neural Motion Planning for Autonomous Driving
Abstract
A self-driving vehicle must understand its environment to determine the appropriate action. Traditional autonomy systems rely on object detection to find the agents in the scene. However, object detection assumes a discrete set of objects and loses information about uncertainty, so any errors compound when predicting the future behavior of those agents. Alternatively, dense occupancy grid maps have been utilized to understand free-space. However, predicting a grid for the entire scene is wasteful since only certain spatio-temporal regions are reachable and relevant to the self-driving vehicle. We present a unified, interpretable, and efficient autonomy framework that moves away from cascading modules that first perceive, then predict, and finally plan. Instead, we shift the paradigm to have the planner query occupancy at relevant spatio-temporal points, restricting the computation to those regions of interest. Exploiting this representation, we evaluate a candidate trajectory around key factors such as collision avoidance, comfort, and progress for safety and interpretability. Our approach achieves better highway driving quality than the state-of-the-art on high-fidelity closed-loop simulations.
Video
Motivation
Previous works have proposed autonomy systems adopting different paradigms. Object-based autonomy systems detect objects in the scene and use these detections for motion planning. This approach (shown below) has interpretable detection outputs but loses information from thresholding confidence scores and has limited uncertainty propagation to planning.
Direct sensor-to-plan autonomy is where the planned trajectory is directly predicted from sensor data without intermediate representations. While this approach minimizes information loss, it suffers from limited interpretability as well as being brittle to distributional shift.
Finally, occupancy-based autonomy uses occupancy as an intermediate representation as illustrated below with predicted occupancy colored white to red over the planning horizon. While minimizing information loss, this approach is interpretable and more robust to distributional shifts. However, existing methods in this family suffer from either limited expressivity or wasted computation by predicting occupancy densely for the entire scene.
Our neural motion planner, QuAD, builds upon two observations: (1) the plans’ reachable space is much smaller than the full spatio-temporal volume and (2) many ego states throughout the trajectories are in close proximity to each other. Using these insights we can limit our queries for occupancy to areas relevant for motion planning which improves the efficiency of our system while maintaining high driving quality.
Method
We propose QuAD, an interpretable, effective and efficient neural motion planner. QuAD diverges from prior works that first perceive, then predict, and finally plan. Instead, our unified autonomy first generates candidate trajectories respecting kinematic constraints and traffic rules, and then queries an implicit occupancy model only at spatio-temporal points needed for planning, which is used to rank the safety of the candidates.
Our approach uses the architecture shown below.
Our method can be summarized as:
- Given the ego state and the map, the trajectory sampler generates candidate plans.
- Leveraging multi-sweep LiDAR and HD map, a scene encoder builds a BEV latent representation which we then use to query an implicit occupancy model ( ImplicitO ) .
- We generate points of interest for motion planning that cover the relevant areas around the ego vehicle future positions.
- Since the points of interest are in close proximity to each other, we quantize them at a certain spatial resolution to create query points to send to our implicit occupancy model. This ensures our approach runs within practical runtimes.
- Finally, we gather the occupancy relevant to each trajectory, cost them, and select the one with the lowest cost.
For more details, the interested reader can refer to the paper.
Results
We compare QuAD to state-of-the-art autonomy models, measuring their ability to drive in closed-loop simulation. We find our method is much safer than the baselines and able to reach goal locations more effectively while avoiding traffic violations. For more detailed results refer to the original paper.
The plot below shows how methods compare when considering the balance between driving quality and runtime, an important factor for real-world deployment. We profile in a crowded scenario with 74 agents and 11 lanes to stress-test the different autonomy systems. We highlight that QuAD is faster than other occupancy-based autonomy models (P3, OccFlow), while attaining the best driving quality both in terms of mission achievement and safety.
Our proposed query point quantization is necessary to reap the benefits from an implicit occupancy decoder without inquiring a prohibitive runtime. The naive alternative of using the continuous query points directly after the generation step incurs very high inference times of ∼700ms. This is due to the high number of points, caused by the high overlap between the candidate plans especially at the beginning of the temporal horizon. Quantization becomes necessary in practice. Utilizing a resolution of 0.5m (the one used in the rest of the experiments), autonomy runtime reduces to ∼250ms.
Below we showcase example scenarios where the goals instructed to the ego consist of semantically diverse and interesting maneuvers. From the cost distribution, we can observe that our planner can understand the mission route and progress towards the goal, it can speed up to pass slow moving vehicles when instructed to lane-change, and can plan smooth merge trajectories at on-ramps.
We visualize the LiDAR point cloud, map, predicted occupancy (by querying ψ at a regular grid, solely for illustration purposes), and cost associated with the trajectory samples.
We also showcase some examples of closed loop simulation of our model against several baselines from different autonomy paradigms.
In this safety-focused cut in scenario only our method is able to avoid collision.
In this merge scenario only ours is able to successfully perform the maneuver.
In this on ramp scenario NMP collides while our approach is able to merge successfully onto the highway.
BibTeX
@inproceedings{biswas2024quad,
title = {QuAD: Query-based Interpretable Neural Motion Planning for Autonomous Driving},
author = {Biswas, Sourav and Casas, Sergio and Sykora, Quin and Agro, Ben and Sadat, Abbas and Urtasun, Raquel},
booktitle = {ICRA},
year = {2024},
}