Goto

Collaborating Authors

 collapse


Structured Federated Learning through Clustered Additive Modeling

Neural Information Processing Systems

Heterogeneous federated learning without assuming any structure is challenging due to the conflicts among non-identical data distributions of clients. In practice, clients often comprise near-homogeneous clusters so training a server-side model per cluster mitigates the conflicts. However, FL with client clustering often suffers from "clustering collapse'', i.e., one cluster's model excels on increasing clients, and reduces to single-model FL. Moreover, cluster-wise models hinder knowledge sharing between clusters and each model depends on fewer clients. Furthermore, the static clustering assumption on data may not hold for dynamically changing models, which are sensitive to cluster imbalance/initialization or outliers.


Neural (Tangent Kernel) Collapse

Neural Information Processing Systems

This work bridges two important concepts: the Neural Tangent Kernel (NTK), which captures the evolution of deep neural networks (DNNs) during training, and the Neural Collapse (NC) phenomenon, which refers to the emergence of symmetry and structure in the last-layer features of well-trained classification DNNs. We adopt the natural assumption that the empirical NTK develops a block structure aligned with the class labels, i.e., samples within the same class have stronger correlations than samples from different classes. Under this assumption, we derive the dynamics of DNNs trained with mean squared (MSE) loss and break them into interpretable phases. Moreover, we identify an invariant that captures the essence of the dynamics, and use it to prove the emergence of NC in DNNs with block-structured NTK. We provide large-scale numerical experiments on three common DNN architectures and three benchmark datasets to support our theory.


Linguistic Collapse: Neural Collapse in (Large) Language Models

Neural Information Processing Systems

Neural collapse ( \mathcal{NC}) is a phenomenon observed in classification tasks where top-layer representations collapse into their class means, which become equinorm, equiangular and aligned with the classifiers.These behaviors -- associated with generalization and robustness -- would manifest under specific conditions: models are trained towards zero loss, with noise-free labels belonging to balanced classes, which do not outnumber the model's hidden dimension.Recent studies have explored \mathcal{NC} in the absence of one or more of these conditions to extend and capitalize on the associated benefits of ideal geometries.Language modeling presents a curious frontier, as \textit{training by token prediction} constitutes a classification task where none of the conditions exist: the vocabulary is imbalanced and exceeds the embedding dimension; different tokens might correspond to similar contextual embeddings; and large language models (LLMs) in particular are typically only trained for a few epochs.This paper empirically investigates the impact of scaling the architectures and training of causal language models (CLMs) on their progression towards \mathcal{NC} .We find that \mathcal{NC} properties that develop with scale (and regularization) are linked to generalization.Moreover, there is evidence of some relationship between \mathcal{NC} and generalization independent of scale.Our work thereby underscores the generality of \mathcal{NC} as it extends to the novel and more challenging setting of language modeling.Downstream, we seek to inspire further research on the phenomenon to deepen our understanding of LLMs -- and neural networks at large -- and improve existing architectures based on \mathcal{NC} -related properties.


Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of LLM Agents

Neural Information Processing Systems

As AI systems pervade human life, ensuring that large language models (LLMs) make safe decisions remains a significant challenge. We introduce the Governance of the Commons Simulation (GovSim), a generative simulation platform designed to study strategic interactions and cooperative decision-making in LLMs. In GovSim, a society of AI agents must collectively balance exploiting a common resource with sustaining it for future use. This environment enables the study of how ethical considerations, strategic planning, and negotiation skills impact cooperative outcomes. We develop an LLM-based agent architecture and test it with the leading open and closed LLMs.


Structured Federated Learning through Clustered Additive Modeling

Neural Information Processing Systems

Heterogeneous federated learning without assuming any structure is challenging due to the conflicts among non-identical data distributions of clients. In practice, clients often comprise near-homogeneous clusters so training a server-side model per cluster mitigates the conflicts. However, FL with client clustering often suffers from "clustering collapse'', i.e., one cluster's model excels on increasing clients, and reduces to single-model FL. Moreover, cluster-wise models hinder knowledge sharing between clusters and each model depends on fewer clients. Furthermore, the static clustering assumption on data may not hold for dynamically changing models, which are sensitive to cluster imbalance/initialization or outliers.


Out-of-Distribution Detection with An Adaptive Likelihood Ratio on Informative Hierarchical VAE

Neural Information Processing Systems

Unsupervised out-of-distribution (OOD) detection is essential for the reliability of machine learning. In the literature, existing work has shown that higher-level semantics captured by hierarchical VAEs can be used to detect OOD instances.However, we empirically show that, the inherent issue of hierarchical VAEs, i.e., posterior collapse'', would seriously limit their capacity for OOD detection.Based on a thorough analysis forposterior collapse'', we propose a novel informative hierarchical VAE to alleviate this issue through enhancing the connections between the data sample and its multi-layer stochastic latent representations during training.Furthermore, we propose a novel score function for unsupervised OOD detection, referred to as Adaptive Likelihood Ratio. With this score function, one can selectively aggregate the semantic information on multiple hidden layers of hierarchical VAEs, leading to a strong separability between in-distribution and OOD samples. Experimental results demonstrate that our method can significantly outperform existing state-of-the-art unsupervised OOD detection approaches.


Robot Gets Tired After Day's Work, Collapses: Watch

#artificialintelligence

Viral Video: It has been a long time since we started availing the services of robots, the electronic humans, perhaps the first ever non-official definition of the wonder machine. Now, robotics is very much in vogue and has mass use across industries. One of the key reasons to deploy these programmable machines is their high efficiency and the ability to work for longer hours than humans without getting tired. However, a video has surfaced showing a robot placing plastic containers on a conveyor belt. The video is in a time-lapse, suggesting that the robot has been on the job for hours and the last few frames show the real-time where the machine picks up a container and as soon as it lifts it, it collapses.

  Country: Asia > India (0.43)
  Industry: Media (0.36)

Limitations of Neural Collapse for Understanding Generalization in Deep Learning

Hui, Like, Belkin, Mikhail, Nakkiran, Preetum

arXiv.org Machine Learning

The recent work of Papyan, Han, & Donoho (2020) presented an intriguing "Neural Collapse" phenomenon, showing a structural property of interpolating classifiers in the late stage of training. This opened a rich area of exploration studying this phenomenon. Our motivation is to study the upper limits of this research program: How far will understanding Neural Collapse take us in understanding deep learning? First, we investigate its role in generalization. We refine the Neural Collapse conjecture into two separate conjectures: collapse on the train set (an optimization property) and collapse on the test distribution (a generalization property). We find that while Neural Collapse often occurs on the train set, it does not occur on the test set. We thus conclude that Neural Collapse is primarily an optimization phenomenon, with as-yet-unclear connections to generalization. Second, we investigate the role of Neural Collapse in feature learning. We show simple, realistic experiments where training longer leads to worse last-layer features, as measured by transfer-performance on a downstream task. This suggests that neural collapse is not always desirable for representation learning, as previously claimed. Finally, we give preliminary evidence of a "cascading collapse" phenomenon, wherein some form of Neural Collapse occurs not only for the last layer, but in earlier layers as well. We hope our work encourages the community to continue the rich line of Neural Collapse research, while also considering its inherent limitations.