Understanding complex social interactions among agents is a key challenge for trajectory prediction. Most existing methods consider the interactions between pairwise traffic agents or in a local area, while the nature of interactions is unlimited, involving an uncertain number of agents and non-local areas simultaneously. Besides, they treat heterogeneous traffic agents the same, namely those among agents of different categories, while neglecting people's diverse reaction patterns toward traffic agents in ifferent categories. To address these problems, we propose a simple yet effective Unlimited Neighborhood Interaction Network (UNIN), which predicts trajectories of heterogeneous agents in multiple categories. Specifically, the proposed unlimited neighborhood interaction module generates the fused-features of all agents involved in an interaction simultaneously, which is adaptive to any number of agents and any range of interaction area. Meanwhile, a hierarchical graph attention module is proposed to obtain category-to-category interaction and agent-to-agent interaction. Finally, parameters of a Gaussian Mixture Model are estimated for generating the future trajectories. Extensive experimental results on benchmark datasets demonstrate a significant performance improvement of our method over the state-of-the-art methods.
Pedestrian trajectory prediction is a critical to avoid autonomous driving collision. But this prediction is a challenging problem due to social forces and cluttered scenes. Such human-human and human-space interactions lead to many socially plausible trajectories. In this paper, we propose a novel LSTM-based algorithm. We tackle the problem by considering the static scene and pedestrian which combine the Graph Convolutional Networks and Temporal Convolutional Networks to extract features from pedestrians. Each pedestrian in the scene is regarded as a node, and we can obtain the relationship between each node and its neighborhoods by graph embedding. It is LSTM that encode the relationship so that our model predicts nodes trajectories in crowd scenarios simultaneously. To effectively predict multiple possible future trajectories, we further introduce Spatio-Temporal Convolutional Block to make the network flexible. Experimental results on two public datasets, i.e. ETH and UCY, demonstrate the effectiveness of our proposed ST-Block and we achieve state-of-the-art approaches in human trajectory prediction.
Pedestrian trajectory prediction in urban scenarios is essential for automated driving. This task is challenging because the behavior of pedestrians is influenced by both their own history paths and the interactions with others. Previous research modeled these interactions with pooling mechanisms or aggregating with hand-crafted attention weights. In this paper, we present the Social Interaction-Weighted Spatio-Temporal Convolutional Neural Network (Social-IWSTCNN), which includes both the spatial and the temporal features. We propose a novel design, namely the Social Interaction Extractor, to learn the spatial and social interaction features of pedestrians. Most previous works used ETH and UCY datasets which include five scenes but do not cover urban traffic scenarios extensively for training and evaluation. In this paper, we use the recently released large-scale Waymo Open Dataset in urban traffic scenarios, which includes 374 urban training scenes and 76 urban testing scenes to analyze the performance of our proposed algorithm in comparison to the state-of-the-art (SOTA) models. The results show that our algorithm outperforms SOTA algorithms such as Social-LSTM, Social-GAN, and Social-STGCNN on both Average Displacement Error (ADE) and Final Displacement Error (FDE). Furthermore, our Social-IWSTCNN is 54.8 times faster in data pre-processing speed, and 4.7 times faster in total test speed than the current best SOTA algorithm Social-STGCNN.
Human trajectory forecasting is an inherently multi-modal problem. Uncertainty in future trajectories stems from two sources: (a) sources that are known to the agent but unknown to the model, such as long term goals and (b)sources that are unknown to both the agent & the model, such as intent of other agents & irreducible randomness indecisions. We propose to factorize this uncertainty into its epistemic & aleatoric sources. We model the epistemic un-certainty through multimodality in long term goals and the aleatoric uncertainty through multimodality in waypoints& paths. To exemplify this dichotomy, we also propose a novel long term trajectory forecasting setting, with prediction horizons upto a minute, an order of magnitude longer than prior works. Finally, we presentY-net, a scene com-pliant trajectory forecasting network that exploits the pro-posed epistemic & aleatoric structure for diverse trajectory predictions across long prediction horizons.Y-net significantly improves previous state-of-the-art performance on both (a) The well studied short prediction horizon settings on the Stanford Drone & ETH/UCY datasets and (b) The proposed long prediction horizon setting on the re-purposed Stanford Drone & Intersection Drone datasets.
Recent advances in trajectory prediction have shown that explicit reasoning about agents' intent is important to accurately forecast their motion. However, the current research activities are not directly applicable to intelligent and safety critical systems. This is mainly because very few public datasets are available, and they only consider pedestrian-specific intents for a short temporal horizon from a restricted egocentric view. To this end, we propose LOKI (LOng term and Key Intentions), a novel largescale dataset that is designed to tackle joint trajectory and intention prediction for heterogeneous traffic agents (pedestrians Figure 1: We show that reasoning about long-term goals and vehicles) in an autonomous driving setting. The and short-term intents plays a significant role in trajectory LOKI dataset is created to discover several factors that may prediction. With a lack of comprehensive benchmarks for affect intention, including i) agent's own will, ii) social interactions, this purpose, we introduce a new dataset for intention and iii) environmental constraints, and iv) contextual trajectory prediction. An example use case is illustrated in information. We also propose a model that jointly performs (a) where we predict the trajectory of the target vehicle. In trajectory and intention prediction, showing that recurrently (b), long-term goals are estimated from agent's own motion.