MONet: Unsupervised Scene Decomposition and Representation
Burgess, Christopher P., Matthey, Loic, Watters, Nicholas, Kabra, Rishabh, Higgins, Irina, Botvinick, Matt, Lerchner, Alexander
Realistic visual scenes contain rich structure, which humans effortlessly exploit to reason effectively and intelligently. In particular, object perception, the ability to perceive and represent individual objects, is considered a fundamental cognitive ability that allows us to understand - and efficiently interact with - the world as perceived through our senses [Johnson, 2018, Green and Quilty-Dunn, 2017]. However, despite recent breakthroughs in computer vision fuelled by advances in deep learning, learning to represent realistic visual scenes in terms of objects remains an open challenge for artificial systems. The impact and application of robust visual object decomposition would be far-reaching. Models such as graph-structured networks that rely on handcrafted object representations have recently achieved remarkable results in a wide range of research areas, including reinforcement learning, physical modeling, and multi-agent control [Battaglia et al., 2018, Wang et al., 2018, Hamrick et al., 2017, Hoshen, 2017]. The prospect of acquiring visual object representations through unsupervised learning could be invaluable for extending the generality and applicability of such models.
Jan-22-2019
- Country:
- Europe > United Kingdom > England
- Greater London > London (0.04)
- Oxfordshire > Oxford (0.04)
- Europe > United Kingdom > England
- Genre:
- Research Report (0.64)
- Industry:
- Health & Medicine (0.88)
- Technology: