Learning Representations by Maximizing Mutual Information Across Views

Bachman, Philip, Hjelm, R Devon, Buchwalter, William

Mar-19-2020, 03:03:33 GMT–Neural Information Processing Systems

We propose an approach to self-supervised representation learning based on maximizing mutual information between features extracted from multiple views of a shared context. For example, one could produce multiple views of a local spatio-temporal context by observing it from different locations (e.g., camera positions within a scene), and via different modalities (e.g., tactile, auditory, or visual). Or, an ImageNet image could provide a context from which one produces multiple views by repeatedly applying data augmentation. Maximizing mutual information between features extracted from these views requires capturing information about high-level factors whose influence spans multiple views – e.g., presence of certain objects or occurrence of certain events. Following our proposed approach, we develop a model which learns image representations that significantly outperform prior methods on the tasks we consider.

artificial intelligence, learning representation, maximizing mutual information, (2 more...)

Neural Information Processing Systems

Mar-19-2020, 03:03:33 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence (0.66)