Unsupervised Object Keypoint Learning using Local Spatial Predictability
Gopalakrishnan, Anand, van Steenkiste, Sjoerd, Schmidhuber, Jürgen
–arXiv.org Artificial Intelligence
Hence, which layer(s) we choose as our feature embedding will have an effect on the outcome of the local spatial prediction problem. While more abstract high-level features are expected to better capture the internal predictive structure of an object, it will be more difficult to attribute the error of the prediction network to the exact image location. On the other hand, while more low-level features can be localized more accurately, they may lack the expressiveness to capture high-level properties of objects. Nonetheless, in practice we find that a spatial feature embedding based on earlier layers of the encoder works well (see also Section 5.3 for an ablation). Local Spatial Prediction Task Using the learned spatial feature embedding we seek out salient regions of the input image that correspond to object parts. Our approach is based on the idea that objects correspond to local regions in feature space that have high internal predictive structure, which allows us to formulate the following local spatial prediction (LSP) task. For each location in the learned spatial feature embedding, we seek to predict the value of the features (across the feature maps) from its neighbouring feature values. When neighbouring areas correspond to the same object-(part), i.e. they regularly appear together, we expect that this prediction problem is easy (green arrow in Figure 3).
arXiv.org Artificial Intelligence
Nov-25-2020
- Country:
- North America > United States > California (0.28)
- Genre:
- Research Report (0.64)
- Industry:
- Leisure & Entertainment > Games (0.68)
- Technology: