The "Law" of the Unconscious Contrastive Learner: Probabilistic Alignment of Unpaired Modalities

Jan-20-2025–arXiv.org Machine Learning

While internet-scale data often comes in pairs (e.g., audio/image, image/text), we often want to perform inferences over modalities unseen together in the training data (e.g., audio/text). Empirically, this can often be addressed by learning multiple contrastive embedding spaces between existing modality pairs, implicitly hoping that unseen modality pairs will end up being aligned. This theoretical paper proves that this hope is well founded, under certain assumptions. Starting with the proper Bayesian approach of integrating out intermediate modalities, we show that directly comparing the representations of data from unpaired modalities can recover the same likelihood ratio. Our analysis builds on prior work on the geometry and probabilistic interpretation of contrastive representations, showing how these representations can answer many of the same inferences as probabilistic graphical models. Our analysis suggests two new ways of using contrastive representations: in settings with pre-trained contrastive models, and for handling language ambiguity in reinforcement learning. Our numerical experiments study the importance of our assumptions and demonstrate these new applications. Much of the appeal of contrastive learning is that it gives a "plug-n-play" approach for swapping one modality for another. Because representations from different modalities are trained to be aligned when representing the same object, the hope is that (say) a language representation and image representation of the same scene can be used as substitutes. This property is practically appealing for at least two reasons. First, it allows us to make use of pre-trained models. If you have a model that wants to make use of (say) language input and you have access to a pre-trained image-language contrastive model, you might simply train your model on the pre-trained image representations and hope that it will continue to work when you swap in the language representations.

artificial intelligence, machine learning, representation, (19 more...)

arXiv.org Machine Learning

Jan-20-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report > New Finding (0.46)

Industry:
- Government (0.68)
- Law (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Uncertainty
    - Bayesian Inference (0.34)
  - Machine Learning > Learning Graphical Models
    - Directed Networks > Bayesian Learning (0.34)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found