050b8ff31bee2dfea65b731e71baccd5-Paper-Conference.pdf
–Neural Information Processing Systems
Object binding, the brain's ability to bind the many features that collectively represent an object into a coherent whole, is central to human cognition. It groups low-level perceptual features into high-level object representations, stores those objects efficiently and compositionally in memory, and supports human reasoning about individual object instances. While prior work often imposes object-centric attention (e.g., Slot Attention) explicitly to probe these benefits, it remains unclear whether this ability naturally emerges in pre-trained Vision Transformers (ViTs). Intuitively, they could: recognizing which patches belong to the same object should be useful for downstream prediction and thus guide attention. Motivated by the quadratic nature of self-attention, we hypothesize that ViTs represent whether two patches belong to the same object, a property we term IsSameObject.
Neural Information Processing Systems
Jun-14-2026, 10:11:16 GMT
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (0.93)
- Research Report
- Industry:
- Health & Medicine > Therapeutic Area > Neurology (0.93)
- Technology:
- Information Technology > Artificial Intelligence
- Vision (1.00)
- Representation & Reasoning (1.00)
- Machine Learning > Neural Networks (1.00)
- Cognitive Science (0.94)
- Natural Language > Large Language Model (0.68)
- Information Technology > Artificial Intelligence