Re-thinking Perception and Motion-Forecasting for Next-Generation Autonomy

by Raquel Urtasun

Next week at CVPR2023, Waabi is presenting part of our next-generation autonomy system:Implicit Occupancy Flow Fields for Perception and Prediction in Self-Driving by Ben Agro, Quinlan Sykora, Sergio Casas, and myself. ImplicitO marks a groundbreaking step forward in the development of safe perception and motion-forecasting systems for self-driving.

0:000:00

The task of perception and motion forecasting involves perceiving where traffic participants are in the world and then predicting where they might go in the near future. The traditional paradigm is to have separate modules tackle different subtasks: an object detector locates a set of objects in the world from sensor data, an object tracker associates detections over time and estimates their past and current trajectories, and a motion forecasting module predicts possible future behaviours for those objects.

This paradigm, however, has several important drawbacks. Upstream errors in object detection and tracking are propagated downstream. The most critical one is when the detection module completely misses an object due to confidence thresholding in the detection task, which can make the planner blind to a traffic participant, potentially leading to a collision. Furthermore, noisy detections and tracks typically result in inaccurate future motion forecasting. These errors cannot be corrected as sensory information is not propagated to the motion forecasting module. Last but not least, very simple forms of uncertainty are typically employed, resulting in poorly calibrated models which either over or under estimate uncertainty. This is problematic as, for example, under estimating uncertainty can lead to dangerous lane changes into very narrow gaps between vehicles. Overestimating it can also result in dangerous situations such as heavy braking that is unnecessary, caused by predicting false positive lane cut-ins, for example.

Previous works in academia have tackled these limitations by predicting dense spatio-temporal occupancy grids that estimate the probability of grid cells being occupied both in the present and in the future. However, these methods require large memory footprints, and as a result, coarse quantization in space and time is employed resulting in suboptimal predictions, which can lead, for example, to missing small objects such as pedestrians. This is particularly problematic on highways as the vehicles traverse the scene at high speeds, and thus the regions of interest are larger than in urban areas.

ImplicitO is based on the observation that the motion planner does not use occupancy estimates in the full dense region around the self-driving vehicle. Instead, it reasons only about a much smaller area surrounding the set of trajectories it is considering as candidates to execute. As a consequence it can focus on potential interactions with other traffic participants, making the best use of computational and memory resources.

In particular, ImplicitO is an expressive implicit model that offers a flexible solution: the motion planner can query the occupancy at any continuous spatio-temporal point by calling a neural implicit function, therefore removing any quantization errors, and only spending computational resources on necessary regions. This results in much more precise motion forecasts produced at a fraction of the computational time of traditional approaches, thus significantly improving the reaction time of the full autonomy system, resulting in much safer driving.

To learn more, visit https://waabi.ai/research/implicito/.