Stochastic Prediction of Multi-Agent Interactions from Partial Observations

Sun, Chen, Karlsson, Per, Wu, Jiajun, Tenenbaum, Joshua B, Murphy, Kevin

arXiv.org Machine Learning 

We present a method that learns to integrate temporal information, from a learned dynamics model, with ambiguous visual information, from a learned vision model, in the context of interacting agents. Our method is based on a graph-structured variational recurrent neural network (Graph-VRNN), which is trained end-to-end to infer the current state of the (partially observed) world, as well as to forecast future states. We show that our method outperforms various baselines on two sports datasets, one based on real basketball trajectories, and one generated by a soccer game engine. At any given time, you can only see a subset of the players, and you may or may not be able to see the ball, yet you probably have some reasonable idea about where all the players currently are, even if they are not in the field of view. Similarly, you cannot see the future, but you may still be able to predict where the "agents" (players and ball) will be, at least approximately. Crucially, these problems are intertwined: we are able to predict future states by using a state dynamics model, but we can also use the same dynamics model to infer the current state of the world by extrapolating from the last time we saw each agent. In this paper, we present a unified approach to state estimation and future forecasting for problems of this kind. More precisely, we assume the observed data consists of a sequence of video frames, v, obtained from a stationary or moving camera.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found