Predicting the behavior of surrounding vehicles is a critical problem in automated driving. We present a novel game theoretic behavior prediction model that achieves state of the art prediction accuracy by explicitly reasoning about possible future interaction between agents. We evaluate our approach on the NGSIM vehicle trajectory data set and demonstrate lower root mean square error than state-of-the-art methods.
The trajectory prediction is significant for the decision-making of autonomous driving vehicles. In this paper, we propose a model to predict the trajectories of target agents around an autonomous vehicle. The main idea of our method is considering the history trajectories of the target agent and the influence of surrounding agents on the target agent. To this end, we encode the target agent history trajectories as an attention mask and construct a social map to encode the interactive relationship between the target agent and its surrounding agents. Given a trajectory sequence, the LSTM networks are firstly utilized to extract the features for all agents, based on which the attention mask and social map are formed. Then, the attention mask and social map are fused to get the fusion feature map, which is processed by the social convolution to obtain a fusion feature representation. Finally, this fusion feature is taken as the input of a variable-length LSTM to predict the trajectory of the target agent. We note that the variable-length LSTM enables our model to handle the case that the number of agents in the sensing scope is highly dynamic in traffic scenes. To verify the effectiveness of our method, we widely compare with several methods on a public dataset, achieving a 20% error decrease. In addition, the model satisfies the real-time requirement with the 32 fps.
Urban environments manifest a high level of complexity, and therefore it is of vital importance for safety systems embedded within autonomous vehicles (AVs) to be able to accurately predict the short-term future motion of nearby agents. This problem can be further understood as generating a sequence of future coordinates for a given agent based on its past motion data e.g. position, velocity, acceleration etc, and whilst current approaches demonstrate plausible results they have a propensity to neglect a scene's physical constrains. In this paper we propose the model based on a combination of the CNN and LSTM encoder-decoder architecture that learns to extract a relevant road features from semantic maps as well as general motion of agents and uses this learned representation to predict their short-term future trajectories. We train and validate the model on the publicly available dataset that provides data from urban areas, allowing us to examine it in challenging and uncertain scenarios. We show that our model is not only capable of anticipating future motion whilst taking into consideration road boundaries, but can also effectively and precisely predict trajectories for a longer time horizon than initially trained for.
We develop a deep generative model built on a fully differentiable simulator for multi-agent trajectory prediction. Agents are modeled with conditional recurrent variational neural networks (CVRNNs), which take as input an ego-centric birdview image representing the current state of the world and output an action, consisting of steering and acceleration, which is used to derive the subsequent agent state using a kinematic bicycle model. The full simulation state is then differentiably rendered for each agent, initiating the next time step. We achieve state-of-the-art results on the INTERACTION dataset, using standard neural architectures and a standard variational training objective, producing realistic multi-modal predictions without any ad-hoc diversity-inducing losses. We conduct ablation studies to examine individual components of the simulator, finding that both the kinematic bicycle model and the continuous feedback from the birdview image are crucial for achieving this level of performance. We name our model ITRA, for "Imagining the Road Ahead".
The widespread interest in autonomous driving technology in recent years  has motivated extensive research in multiagent navigation in driving domains. One of the most challenging driving domains  is the uncontrolled intersection, i.e., a street intersection that features no traffic signs or signals. Within this domain, we focus on scenarios in which agents do not communicate explicitly or implicitly through e.g., turn signals. This model setup gives rise to challenging multi-vehicle encounters that mimic real-world situations (arising due to human distraction, violation of traffic rules or special emergencies) that result in fatal accidents . The frequency and severity of such situations has motivated vivid research interest in uncontrolled intersections [4, 5, 6]. In the absence of explicit traffic signs, signals, rules or explicit communication among agents, avoiding collisions at intersections relies on the ability of agents to predict the dynamics of interaction amongst themselves. One prevalent way to model multiagent dynamics is via trajectory prediction. However, multistep multiagent trajectory prediction is NPhard , whereas the sample complexity of existing learning algorithms effectively prohibits the extraction of practical models. Our key insight is that the geometric structure of the intersection and the incentive of agents to move efficiently and avoid collisions with each other (rationality) compress the space of possible multiagent trajectories, effectively simplifying inference.