downstream task
Hierarchical Contrastive Learning for Multimodal Data
Li, Huichao, Yu, Junhan, Zhou, Doudou
Multimodal representation learning is commonly built on a shared-private decomposition, treating latent information as either common to all modalities or specific to one. This binary view is often inadequate: many factors are shared by only subsets of modalities, and ignoring such partial sharing can over-align unrelated signals and obscure complementary information. We propose Hierarchical Contrastive Learning (HCL), a framework that learns globally shared, partially shared, and modality-specific representations within a unified model. HCL combines a hierarchical latent-variable formulation with structural sparsity and a structure-aware contrastive objective that aligns only modalities that genuinely share a latent factor. Under uncorrelated latent variables, we prove identifiability of the hierarchical decomposition, establish recovery guarantees for the loading matrices, and derive parameter estimation and excess-risk bounds for downstream prediction. Simulations show accurate recovery of hierarchical structure and effective selection of task-relevant components. On multimodal electronic health records, HCL yields more informative representations and consistently improves predictive performance.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > Oregon > Multnomah County > Portland (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > Hungary > Hajdú-Bihar County > Debrecen (0.04)
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
- North America > United States (0.28)
- Asia > China > Hong Kong (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- North America > United States (0.14)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- (2 more...)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.68)
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
- Asia > China > Hong Kong (0.04)
- North America > United States > California (0.04)
- Europe > France (0.04)
- Energy > Power Industry (0.93)
- Information Technology (0.67)
- North America > United States > California > Los Angeles County > Long Beach (0.14)
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (15 more...)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)
- Law (0.47)
- Government (0.47)