AITopics

Incomplete multi-view clustering aims to learn complete correlations among samples by leveraging complementary information across multiple views for clustering. Anchor-based methods further establish sample-level similarities for representative anchor generation, effectively addressing scalability issues in large-scale scenarios. Despite efficiency improvements, existing methods overlook the misguidance in anchors learning induced by partial missing samples, i.e., the absence of samples results in shift of learned anchors, further leading to sub-optimal clustering performance. To conquer the challenges, our solution involves a cross-view reconstruction strategy that not only alleviate the anchor shift problem through a carefully designed cross-view learning process, but also reconstructs missing samples in a way that transcends the limitations imposed by convex combinations. By employing affine combinations, our method explores areas beyond the convex hull defined by anchors, thereby illuminating blind spots in the reconstruction of missing samples. Experimental results on four benchmark datasets and three large-scale datasets validate the effectiveness of our proposed method.

artificial intelligence, machine learning, optimization problem, (17 more...)

Neural Information Processing Systems

Country: Asia > China (0.28)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)

Add feedback

Temperature Balancing, Layer-wise Weight Analysis, and Neural Network Training

Neural Information Processing SystemsMay-25-2025, 11:36:34 GMT

Regularization in modern machine learning is crucial, and it can take various forms in algorithmic design: training set, model family, error function, regularization terms, and optimizations. In particular, the learning rate, which can be interpreted as a temperature-like parameter within the statistical mechanics of learning, plays a crucial role in neural network training. Indeed, many widely adopted training strategies basically just define the decay of the learning rate over time. This process can be interpreted as decreasing a temperature, using either a global learning rate (for the entire model) or a learning rate that varies for each parameter. This paper proposes TempBalance, a straightforward yet effective layer-wise learning rate method. TempBalance is based on Heavy-Tailed Self-Regularization (HT-SR) Theory, an approach which characterizes the implicit self-regularization of different layers in trained models. We demonstrate the efficacy of using HT-SR-motivated metrics to guide the scheduling and balancing of temperature across all network layers during model training, resulting in improved performance during testing.

artificial intelligence, machine learning, tempbalance, (17 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > New Finding (1.00)

Add feedback

Achieving Cross Modal Generalization with Multimodal Unified Representation Hai Huang 1

Neural Information Processing SystemsMay-25-2025, 11:34:28 GMT

This paper introduces a novel task called Cross Modal Generalization (CMG), which addresses the challenge of learning a unified discrete representation from paired multimodal data during pre-training. Then in downstream tasks, the model can achieve zero-shot generalization ability in other modalities when only one modal is labeled. Existing approaches in multimodal representation learning focus more on coarse-grained alignment or rely on the assumption that information from different modalities is completely aligned, which is impractical in real-world scenarios. To overcome this limitation, we propose Uni-Code, which contains two key contributions: the Dual Cross-modal Information Disentangling (DCID) module and the Multi-Modal Exponential Moving Average (MM-EMA). These methods facilitate bidirectional supervision between modalities and align semantically equivalent information in a shared discrete latent space, enabling fine-grained unified representation of multimodal sequences. During pre-training, we investigate various modality combinations, including audio-visual, audio-text, and the tri-modal combination of audio-visual-text. Extensive experiments on various downstream tasks, i.e., cross-modal event classification, localization, cross-modal retrieval, query-based video segmentation, and cross-dataset event localization, demonstrate the effectiveness of our proposed methods.

artificial intelligence, machine learning, natural language, (14 more...)

Neural Information Processing Systems

Country: