Zero-Shot Object-Centric Representation Learning

Didolkar, Aniket, Zadaianchuk, Andrii, Goyal, Anirudh, Mozer, Mike, Bengio, Yoshua, Martius, Georg, Seitzer, Maximilian

Aug-17-2024–arXiv.org Artificial Intelligence

The goal of object-centric representation learning is to decompose visual scenes into a structured representation that isolates the entities. Recent successes have shown that object-centric representation learning can be scaled to real-world scenes by utilizing pre-trained self-supervised features. However, so far, object-centric methods have mostly been applied in-distribution, with models trained and evaluated on the same dataset. This is in contrast to the wider trend in machine learning towards general-purpose models directly applicable to unseen data and tasks. Thus, in this work, we study current object-centric methods through the lens of zero-shot generalization by introducing a benchmark comprising eight different synthetic and real-world datasets. We analyze the factors influencing zero-shot performance and find that training on diverse real-world images improves transferability to unseen scenarios. Furthermore, inspired by the success of task-specific fine-tuning in foundation models, we introduce a novel fine-tuning strategy to adapt pre-trained vision encoders for the task of object discovery. We find that the proposed approach results in state-of-the-art performance for unsupervised object discovery, exhibiting strong zero-shot transfer to unseen datasets.

arxiv, dataset, inosaur, (17 more...)

arXiv.org Artificial Intelligence

Aug-17-2024

arXiv.org PDF

Add feedback

Country:
- Asia > China (0.04)
- North America > Canada
  - Quebec > Montreal (0.04)
- Europe
  - Netherlands > North Holland
    - Amsterdam (0.04)
  - Germany > Baden-Württemberg
    - Tübingen Region > Tübingen (0.04)

Genre:
- Research Report (1.00)

Industry:
- Government (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning
    - Neural Networks > Deep Learning (1.00)
    - Statistical Learning (0.92)