MONet: Unsupervised Scene Decomposition and Representation