Unlocking Slot Attention by Changing Optimal Transport Costs
Zhang, Yan, Zhang, David W., Lacoste-Julien, Simon, Burghouts, Gertjan J., Snoek, Cees G. M.
–arXiv.org Artificial Intelligence
Slot attention is a powerful method for object-centric modeling in images and videos. However, its set-equivariance limits its ability to handle videos with a dynamic number of objects because it cannot break ties. To overcome this limitation, we first establish a connection between slot attention and optimal transport. Based on this new perspective we propose MESH (Minimize Entropy of Sinkhorn): a cross-attention module that combines the tiebreaking properties of unregularized optimal transport with the speed of regularized optimal transport. We evaluate slot attention using MESH on multiple object-centric learning benchmarks and find significant improvements over slot attention in every setting.
arXiv.org Artificial Intelligence
May-31-2023
- Country:
- North America
- Canada > Quebec (0.14)
- United States > Hawaii (0.14)
- North America
- Genre:
- Research Report (0.82)
- Technology: