The Indra Representation Hypothesis for Multimodal Alignment

Jun-14-2026, 06:30:57 GMT–Neural Information Processing Systems

Recent studies have uncovered an interesting phenomenon: unimodal foundation models tend to learn convergent representations, regardless of differences in architecture, training objectives, or data modalities. However, these representations are essentially internal abstractions of samples that characterize samples independently, leading to limited expressiveness. In this paper, we propose The Indra Representation Hypothesis, inspired by the philosophical metaphor of Indra's Net. We argue that representations from unimodal foundation models are converging to implicitly reflect a shared relational structure underlying reality, akin to the relational ontology of Indra's Net.

artificial intelligence, proceedings, representation, (6 more...)

Neural Information Processing Systems

Jun-14-2026, 06:30:57 GMT

Conferences Web Page

Add feedback

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence (0.40)