Reviews: Neural Expectation Maximization

Neural Information Processing Systems 

This paper presents some though-provoking experiments in unsupervised entity recognition from time-series data. For me the impact of the paper came in Figs 3 and 5, which showed a very human-like decomposition. I'm not convinced that analyzing a few static shapes is an important problem these days. To me, it seems like a "first step" toward a more significant problem of recognizing concurrent actions (In this case, they have actions like "flying triangle" and "flying 9", with occasional occlusions muddying the picture). For example, RNN-EM running on non-pixel input features (output from a static object detector output (YOLO?)) seems one reasonable comparison point.