Goto

Collaborating Authors

 orthogonality regularization




Can We Gain More from Orthogonality Regularizations in Training Deep Networks?

Neural Information Processing Systems

This paper seeks to answer the question: as the (near-) orthogonality of weights is found to be a favorable property for training deep convolutional neural networks, how can we enforce it in more effective and easy-to-use ways? We develop novel orthogonality regularizations on training deep CNNs, utilizing various advanced analytical tools such as mutual coherence and restricted isometry property. These plug-and-play regularizations can be conveniently incorporated into training almost any CNN without extra hassle.



Feedback Alignment Meets Low-Rank Manifolds: A Structured Recipe for Local Learning

Roy, Arani, Apolinario, Marco P., Biswas, Shristi Das, Roy, Kaushik

arXiv.org Artificial Intelligence

Training deep neural networks (DNNs) with backpropagation (BP) achieves state-of-the-art accuracy but requires global error propagation and full parameterization, leading to substantial memory and computational overhead. Direct Feedback Alignment (DFA) enables local, parallelizable updates with lower memory requirements but is limited by unstructured feedback and poor scalability in deeper architectures, specially convolutional neural networks. To address these limitations, we propose a structured local learning framework that operates directly on low-rank manifolds defined by the Singular Value Decomposition (SVD) of weight matrices. Each layer is trained in its decomposed form, with updates applied to the SVD components using a composite loss that integrates cross-entropy, subspace alignment, and orthogonality regularization. Feedback matrices are constructed to match the SVD structure, ensuring consistent alignment between forward and feedback pathways. Our method reduces the number of trainable parameters relative to the original DFA model, without relying on pruning or post hoc compression. Experiments on CIFAR-10, CIFAR-100, and ImageNet show that our method achieves accuracy comparable to that of BP. Ablation studies confirm the importance of each loss term in the low-rank setting. These results establish local learning on low-rank manifolds as a principled and scalable alternative to full-rank gradient-based training.


Deeply Shared Filter Bases for Parameter-Efficient Convolutional Neural Networks

Neural Information Processing Systems

Modern convolutional neural networks (CNNs) have massive identical convolution blocks, and, hence, recursive sharing of parameters across these blocks has been proposed to reduce the amount of parameters.


Preventing Dimensional Collapse in Self-Supervised Learning via Orthogonality Regularization

Neural Information Processing Systems

Self-supervised learning (SSL) has rapidly advanced in recent years, approaching the performance of its supervised counterparts through the extraction of representations from unlabeled data. However, dimensional collapse, where a few large eigenvalues dominate the eigenspace, poses a significant obstacle for SSL. When dimensional collapse occurs on features (e.g. To this end, we first time propose a mitigation approach employing orthogonal regularization (OR) across the encoder, targeting both convolutional and linear layers during pretraining. OR promotes orthogonality within weight matrices, thus safeguarding against the dimensional collapse of weight matrices, hidden features, and representations. Our empirical investigations demonstrate that OR significantly enhances the performance of SSL methods across diverse benchmarks, yielding consistent gains with both CNNs and Transformer-based architectures.


FedOC: Optimizing Global Prototypes with Orthogonality Constraints for Enhancing Embeddings Separation in Heterogeneous Federated Learning

Guo, Fucheng, Luan, Zeyu, Li, Qing, Zhao, Dan, Jiang, Yong

arXiv.org Artificial Intelligence

Federated Learning (FL) has emerged as an essential framework for distributed machine learning, especially with its potential for privacy-preserving data processing. However, existing FL frameworks struggle to address statistical and model heterogeneity, which severely impacts model performance. While Heterogeneous Federated Learning (HtFL) introduces prototype-based strategies to address the challenges, current approaches face limitations in achieving optimal separation of prototypes. This paper presents FedOC, a novel HtFL algorithm designed to improve global prototype separation through orthogonality constraints, which not only increase intra-class prototype similarity but also significantly expand the inter-class angular separation. With the guidance of the global prototype, each client keeps its embeddings aligned with the corresponding prototype in the feature space, promoting directional independence that integrates seamlessly with the cross-entropy (CE) loss. We provide theoretical proof of FedOC's convergence under non-convex conditions. Extensive experiments demonstrate that FedOC outperforms seven state-of-the-art baselines, achieving up to a 10.12% accuracy improvement in both statistical and model heterogeneity settings.


Learnable Similarity and Dissimilarity Guided Symmetric Non-Negative Matrix Factorization

Lyu, Wenlong, Jia, Yuheng

arXiv.org Artificial Intelligence

Symmetric nonnegative matrix factorization (SymNMF) is a powerful tool for clustering, which typically uses the $k$-nearest neighbor ($k$-NN) method to construct similarity matrix. However, $k$-NN may mislead clustering since the neighbors may belong to different clusters, and its reliability generally decreases as $k$ grows. In this paper, we construct the similarity matrix as a weighted $k$-NN graph with learnable weight that reflects the reliability of each $k$-th NN. This approach reduces the search space of the similarity matrix learning to $n - 1$ dimension, as opposed to the $\mathcal{O}(n^2)$ dimension of existing methods, where $n$ represents the number of samples. Moreover, to obtain a discriminative similarity matrix, we introduce a dissimilarity matrix with a dual structure of the similarity matrix, and propose a new form of orthogonality regularization with discussions on its geometric interpretation and numerical stability. An efficient alternative optimization algorithm is designed to solve the proposed model, with theoretically guarantee that the variables converge to a stationary point that satisfies the KKT conditions. The advantage of the proposed model is demonstrated by the comparison with nine state-of-the-art clustering methods on eight datasets. The code is available at \url{https://github.com/lwl-learning/LSDGSymNMF}.


Reviews: Can We Gain More from Orthogonality Regularizations in Training Deep Networks?

Neural Information Processing Systems

In extensive experiments with state of the art models, the paper shows that soft orthogonality can improve training stability and yield better classification accuracy than the same models trained without such regularization. The paper proposes a method to approximately enforce all singular values of the weight matrices to be equal to 1, using a sampling-based approach that does not require computing an expensive SVD operation. Major comments: This paper presents interesting experiments showing that regularization towards orthogonal weights can stabilize and speed up learning, particularly near the beginning of training; and improve final test accuracy in several large models. These results could be of broad interest. One concern with the experimental methods is that they use carefully sculpted hyper parameter trajectories for some methods. How were these trajectories selected?