Towards the Generalization of Multi-view Learning: An Information-theoretical Analysis

Wen, Wen, Gong, Tieliang, Dong, Yuxin, Yu, Shujian, Zhang, Weizhan

arXiv.org Machine Learning 

In most scientific data analysis scenarios, data collected from diverse domains and different sensors exhibit heterogeneous properties while preserving underlying connections. For example, (1) a piece of text can express the same semantics and sentiment in multiple different languages; (2) the user's interest can be reflected in the text posted, images uploaded, and videos viewed; (3) animals perceive potential dangers in their surroundings through various senses such as sight, hearing, and smell. All of these reflect different perspectives of the data, collectively referred to as multi-view data. Extracting consensus and complementarity information from multiple views to achieve a comprehensive representation of multi-view data, has stimulated research interest across various fields and led to the development of multi-view learning Hamdi et al. (2021); Fan et al. (2022); Fu et al. (2022); Hong et al. (2023). While various methodologies have emerged in multi-view learning, predominantly encompassing canonical correlation analysis (CCA)-based approaches Gao et al. (2020); Chen et al. (2022); Shu et al. (2022) and engineering-driven techniques Xu et al. (2021); Bai et al. (2023), these methods suffer from a critical limitation. Specifically, their emphasis on maximizing cross-view consensus information often comes at the expense of view-specific, task-relevant information, thereby potentially compromising downstream performance Liang et al. (2024). Recent significant efforts have been dedicated to leveraging diverse information-theoretic techniques to precisely capture both view-common and view-unique components from multiple views Wang et al. (2019); Federici et al. (2020); Wang et al. (2023); Cui et al. (2024); Zhang et al. (2024), thereby yielding maximally disentangled representation and improving generalization ability. For instance, Kleinman et al. (2024) and Zhang et al. (2024) introduce the notion of Gács-Körner common information (Gács et al., 1973) and utilize total correlation between consensus and complementarity information to extract mutually independent cross-view common and unique components.