Unsupervised Object Learning via Common Fate
Tangemann, Matthias, Schneider, Steffen, von Kügelgen, Julius, Locatello, Francesco, Gehler, Peter, Brox, Thomas, Kümmerer, Matthias, Bethge, Matthias, Schölkopf, Bernhard
In human vision, the Principle of Common Fate of Gestalt Psychology (Wertheimer, 2012) has been shown to play an important role for object learning (Spelke, 1990). It posits that elements that are moving together tend to be perceived as one--a perceptual bias that may have evolved to be able to recognize camouflaged predators (Troscianko et al., 2009). In our work, we show that this principle can be successfully used also for machine vision by using it in a multi-stage object learning approach (Figure 1): First, we use unsupervised motion segmentation to obtain a candidate segmentation of a video frame. Second, we train generative object and background models on this segmentation. While the regions obtained by the motion segmentation are caused by objects moving in 3D, only visible parts can be segmented. To learn the actual objects (i.e., the causes), a crucial task for the object model is learning to generalize beyond the occlusions present in its input data. To measure success, we provide a dataset including object ground truth. As the last stage, we show that the learned object and background models can be combined into a flexible scene model that allows sampling manipulated novel scenes. Thus, in contrast to existing object-centric models trained end-to-end, our work aims at decomposing object learning into evaluable subproblems and testing the potential of exploiting object motions for building scalable object-centric models that allow for causally meaningful interventions in generation.
Oct-13-2021
- Country:
- Europe
- Germany > Baden-Württemberg (0.14)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.14)
- North America > United States
- California (0.14)
- South America > Brazil
- Rio de Janeiro (0.14)
- Europe
- Genre:
- Research Report (0.40)
- Technology: