Goto

Collaborating Authors

 classifier vector





A Neural Collapse and simplex ETF

Neural Information Processing Systems

Then the same solution in the Lemma 1 is obtained. We will prove that if Assumptions 1 and 2 hold, the stochastic gradients cannot be uniformly bounded. However, FedGELA might reach better local optimal by adapting the feature structure. Here we complete the proof. "existing angle" as the angle of classifier vectors belonging to classes that exist in a local client In Fed-ISIC2019, there exists a true PCDD situation that needs to be solved.



f7f5f501282771c96bb3fedcc96bedfe-Paper-Conference.pdf

Neural Information Processing Systems

Modern deep neural networks for classification usually jointly learn a backbone for representation and a linear classifier to output the logit of each class.


All-around Neural Collapse for Imbalanced Classification

Zhang, Enhao, Li, Chaohua, Geng, Chuanxing, Chen, Songcan

arXiv.org Artificial Intelligence

Neural Collapse (NC) presents an elegant geometric structure that enables individual activations (features), class means and classifier (weights) vectors to reach \textit{optimal} inter-class separability during the terminal phase of training on a \textit{balanced} dataset. Once shifted to imbalanced classification, such an optimal structure of NC can be readily destroyed by the notorious \textit{minority collapse}, where the classifier vectors corresponding to the minority classes are squeezed. In response, existing works endeavor to recover NC typically by optimizing classifiers. However, we discover that this squeezing phenomenon is not only confined to classifier vectors but also occurs with class means. Consequently, reconstructing NC solely at the classifier aspect may be futile, as the feature means remain compressed, leading to the violation of inherent \textit{self-duality} in NC (\textit{i.e.}, class means and classifier vectors converge mutually) and incidentally, resulting in an unsatisfactory collapse of individual activations towards the corresponding class means. To shake off these dilemmas, we present a unified \textbf{All}-around \textbf{N}eural \textbf{C}ollapse framework (AllNC), aiming to comprehensively restore NC across multiple aspects including individual activations, class means and classifier vectors. We thoroughly analyze its effectiveness and verify on multiple benchmark datasets that it achieves state-of-the-art in both balanced and imbalanced settings.

  Country:
  Genre: Research Report > New Finding (0.68)
  Industry: Education (0.68)

Federated Learning with Bilateral Curation for Partially Class-Disjoint Data

Fan, Ziqing, Zhang, Ruipeng, Yao, Jiangchao, Han, Bo, Zhang, Ya, Wang, Yanfeng

arXiv.org Artificial Intelligence

Partially class-disjoint data (PCDD), a common yet under-explored data formation where each client contributes a part of classes (instead of all classes) of samples, severely challenges the performance of federated algorithms. Without full classes, the local objective will contradict the global objective, yielding the angle collapse problem for locally missing classes and the space waste problem for locally existing classes. As far as we know, none of the existing methods can intrinsically mitigate PCDD challenges to achieve holistic improvement in the bilateral views (both global view and local view) of federated learning. To address this dilemma, we are inspired by the strong generalization of simplex Equiangular Tight Frame (ETF) on the imbalanced data, and propose a novel approach called FedGELA where the classifier is globally fixed as a simplex ETF while locally adapted to the personal distributions. Globally, FedGELA provides fair and equal discrimination for all classes and avoids inaccurate updates of the classifier, while locally it utilizes the space of locally missing classes for locally existing classes. We conduct extensive experiments on a range of datasets to demonstrate that our FedGELA achieves promising performance (averaged improvement of 3.9% to FedAvg and 1.5% to best baselines) and provide both local and global convergence guarantees.


Learning Equi-angular Representations for Online Continual Learning

Seo, Minhyuk, Koh, Hyunseo, Jeung, Wonje, Lee, Minjae, Kim, San, Lee, Hankook, Cho, Sungjun, Choi, Sungik, Kim, Hyunwoo, Choi, Jonghyun

arXiv.org Artificial Intelligence

Online continual learning suffers from an underfitted solution due to insufficient training for prompt model update (e.g., single-epoch training). To address the challenge, we propose an efficient online continual learning method using the neural collapse phenomenon. In particular, we induce neural collapse to form a simplex equiangular tight frame (ETF) structure in the representation space so that the continuously learned model with a single epoch can better fit to the streamed data by proposing preparatory data training and residual correction in the representation space. With an extensive set of empirical validations using CIFAR-10/100, TinyImageNet, ImageNet-200, and ImageNet-1K, we show that our proposed method outperforms state-of-the-art methods by a noticeable margin in various online continual learning scenarios such as disjoint and Gaussian scheduled continuous (i.e., boundary-free) data setups.


Contrast with Major Classifier Vectors for Federated Medical Relation Extraction with Heterogeneous Label Distribution

Du, Chunhui, He, Hao, Jin, Yaohui

arXiv.org Artificial Intelligence

Federated medical relation extraction enables multiple clients to train a deep network collaboratively without sharing their raw medical data. In order to handle the heterogeneous label distribution across clients, most of the existing works only involve enforcing regularization between local and global models during optimization. In this paper, we fully utilize the models of all clients and propose a novel concept of \textit{major classifier vectors}, where a group of class vectors is obtained in an ensemble rather than the weighted average method on the server. The major classifier vectors are then distributed to all clients and the local training of each client is Contrasted with Major Classifier vectors (FedCMC), so the local model is not prone to overfitting to the local label distribution. FedCMC requires only a small amount of additional transfer of classifier parameters without any leakage of raw data, extracted representations, and label distributions. Our extensive experiments show that FedCMC outperforms the other state-of-the-art FL algorithms on three medical relation extraction datasets.