Goto

Collaborating Authors

 orthogonality







MaximumClassSeparationasInductiveBias inOneMatrix

Neural Information Processing Systems

The main observation behind our approach is that separation does not require optimization butcan besolvedinclosed-form prior totraining and plugged into a network.


a576eafbce762079f7d1f77fca1c5cc2-AuthorFeedback.pdf

Neural Information Processing Systems

The novelty of our contribution is2 two-fold: First, our proposed learning rule with modifications3 to both sides of the gradient update is novel. Tasks are best retained using the double-sided approach.R4raised that9 theydidnotunderstand themotivation forthedouble-sided learning rule. For example, when aMemory15 task is learned first, followed by aDelay task with the same input/output structure (e.g. Our work ismotivated by the question ofhowthesameneural population may be36 involved in computations relating to multiple tasks (lines 23-24). Fixed point structures were highly overlapping upon visual inspection in TDR subspaces.49


Orthogonium : A Unified, Efficient Library of Orthogonal and 1-Lipschitz Building Blocks

Boissin, Thibaut, Mamalet, Franck, Lafargue, Valentin, Serrurier, Mathieu

arXiv.org Machine Learning

Orthogonal and 1-Lipschitz neural network layers are essential building blocks in robust deep learning architectures, crucial for certified adversarial robustness, stable generative models, and reliable recurrent networks. Despite significant advancements, existing implementations remain fragmented, limited, and computationally demanding. To address these issues, we introduce Orthogonium , a unified, efficient, and comprehensive PyTorch library providing orthogonal and 1-Lipschitz layers. Orthogonium provides access to standard convolution features-including support for strides, dilation, grouping, and transposed-while maintaining strict mathematical guarantees. Its optimized implementations reduce overhead on large scale benchmarks such as ImageNet. Moreover, rigorous testing within the library has uncovered critical errors in existing implementations, emphasizing the importance of standardized and reliable tools. Orthogonium thus significantly lowers adoption barriers, enabling scalable experimentation and integration across diverse applications requiring orthogonality and robust Lipschitz constraints. Orthogonium is available at https://github.com/deel-ai/orthogonium.


Bridging The Gap between Low-rank and Orthogonal Adaptation via Householder Reflection Adaptation

Neural Information Processing Systems

While following different technical routes, both low-rank and orthogonal adaptation techniques can efficiently adapt large-scale pre-training models in specific tasks or domains based on a small piece of trainable parameters. In this study, we bridge the gap between these two techniques, proposing a simple but effective adaptation method based on Householder reflections. Given a pre-trained model, our method fine-tunes its layers by multiplying each frozen weight matrix with an orthogonal matrix constructed by a chain of learnable Householder reflections (HRs). This HR-based orthogonal fine-tuning is equivalent to an adaptive low-rank adaptation. Moreover, we show that the orthogonality of the reflection planes corresponding to the HRs impacts the model capacity and regularity. The analysis motivates us to regularize the orthogonality of the HRs, leading to different implementations of the proposed Householder reflection adaptation (HRA) method. Compared with state-of-the-art methods, HRA achieves superior performance with fewer learnable parameters when adapting large language models and conditional image generators. The code of the experiments is available at https://github.com/DaShenZi721/HRA,


Batch Normalization Orthogonalizes Representations in Deep Random Networks

Neural Information Processing Systems

This paper underlines an elegant property of batch-normalization (BN): Successive batch normalizations with random linear updates make samples increasingly orthogonal. We establish a non-asymptotic characterization of the interplay between depth, width, and the orthogonality of deep representations. More precisely, we prove, under a mild assumption, the deviation of the representations from orthogonality rapidly decays with depth up to a term inversely proportional to the network width. This result has two main theoretical and practical implications: 1) Theoretically, as the depth grows, the distribution of the outputs contracts to a Wasserstein-2 ball around an isotropic normal distribution. Furthermore, the radius of this Wasserstein ball shrinks with the width of the network.