Reviews: Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

Neural Information Processing Systems 

I have read the other reviews and the author rebuttal. I am still very much in favor of accepting this paper, but I have revised my score down from a 9 to an 8; some of the issues pointed out by the other reviewers, while well-addressed in the rebuttal, made me realize that my initial view of the paper was a bit too rosy. The model starts with the basic Attend, Infer, Repeat (AIR) framework and extends it to handle images sequences (SQAIR). This extension requires taking into account the fact that objects may enter into or leave the frame over the course of a motion sequence. To support this behavior, SQAIR's generative and inference networks for each frame have two phases.