Attend, Infer, Repeat: Fast Scene Understanding with Generative Models