Goto

Collaborating Authors

 orth



Basis-Oriented Low-rank Transfer for Few-Shot and Test-Time Adaptation

Park, Junghwan, Cho, Woojin, Heo, Junhyuk, Kwon, Darongsae, Lee, Kookjin

arXiv.org Artificial Intelligence

Adapting large pre-trained models to unseen tasks under tight data and compute budgets remains challenging. Meta-learning approaches explicitly learn good initializations, but they require an additional meta-training phase over many tasks, incur high training cost, and can be unstable. At the same time, the number of task-specific pre-trained models continues to grow, yet the question of how to transfer them to new tasks with minimal additional training remains relatively underexplored. We propose BOLT (Basis-Oriented Low-rank Transfer), a framework that reuses existing fine-tuned models not by merging weights, but instead by extracting an orthogonal, task-informed spectral basis and adapting within that subspace. In the offline phase, BOLT collects dominant singular directions from multiple task vectors and orthogonalizes them per layer to form reusable bases. In the online phase, we freeze these bases and train only a small set of diagonal coefficients per layer for the new task, yielding a rank-controlled update with very few trainable parameters. This design provides (i) a strong, training-free initialization for unseen tasks, obtained by pooling source-task coefficients, along with a lightweight rescaling step while leveraging the shared orthogonal bases, and (ii) a parameter-efficient fine-tuning (PEFT) path that, in our experiments, achieves robust performance compared to common PEFT baselines as well as a representative meta-learned initialization. Our results show that constraining adaptation to a task-informed orthogonal subspace provides an effective alternative for unseen-task transfer.


1ea97de85eb634d580161c603422437f-Supplemental.pdf

Neural Information Processing Systems

Supplementary material: Hold me tight! A Theoretical margin distribution of a linear classifier 2 B Examples of frequency "flipped" images 4 C Invariance and elasticity on MNIST data 4 D Connections to catastrophic forgetting 5 E Examples of filtered images 6 F Subspace sampling of the DCT 6 G Training parameters 7 H Cross-dataset performance 8 I Margin distribution for standard networks 9 J Adversarial training parameters 13 K Description of L2-PGD attack on frequency "flipped" data 14 L Spectral decomposition on frequency "flipped" data 15 M Margin distribution for adversarially trained networks 16 N Margin distribution on random subspaces 19 We demonstrate this effect in practice by repeating the experiment of Sec. MLP we use a simple logistic regression (see Table S1).Clearly, although the values along Figure S1 shows a few example images of the frequency "flipped" versions of the standard computer We further validate our observation of Section 3.2.2 that small margin do indeed corresponds to After this, we continue training the network with a linearly decaying learning rate (max. Figure S4: Filtered image examples. Table S2 shows the performance and training parameters of the different networks used in the paper.


Fast Summation of Radial Kernels via QMC Slicing

Hertrich, Johannes, Jahn, Tim, Quellmalz, Michael

arXiv.org Machine Learning

The fast computation of large kernel sums is a challenging task, which arises as a subproblem in any kernel method. We approach the problem by slicing, which relies on random projections to one-dimensional subspaces and fast Fourier summation. We prove bounds for the slicing error and propose a quasi-Monte Carlo (QMC) approach for selecting the projections based on spherical quadrature rules. Numerical examples demonstrate that our QMC-slicing approach significantly outperforms existing methods like (QMC-)random Fourier features, orthogonal Fourier features or non-QMC slicing on standard test datasets.


People are obsessed with this weird pizza box. The company behind it won't discuss it

Los Angeles Times

When Sookie Orth sat down to write her college essay last fall, something quickly came to mind. Orth, then a senior at Sequoyah School in Pasadena, began her draft with a declaration: "I learned how to fold a pizza box at the age of nine." She told the story of her years-long connection with Pizza of Venice in Altadena, where she often dined with her family as a little kid. One day, the manager invited her to assemble a box. Impressed with Orth's speed, the woman told her she could work at the pizzeria when she was older.


RVT: Robotic View Transformer for 3D Object Manipulation

Goyal, Ankit, Xu, Jie, Guo, Yijie, Blukis, Valts, Chao, Yu-Wei, Fox, Dieter

arXiv.org Artificial Intelligence

For 3D object manipulation, methods that build an explicit 3D representation perform better than those relying only on camera images. But using explicit 3D representations like voxels comes at large computing cost, adversely affecting scalability. In this work, we propose RVT, a multi-view transformer for 3D manipulation that is both scalable and accurate. Some key features of RVT are an attention mechanism to aggregate information across views and re-rendering of the camera input from virtual views around the robot workspace. In simulations, we find that a single RVT model works well across 18 RLBench tasks with 249 task variations, achieving 26% higher relative success than the existing state-of-the-art method (PerAct). It also trains 36X faster than PerAct for achieving the same performance and achieves 2.3X the inference speed of PerAct. Further, RVT can perform a variety of manipulation tasks in the real world with just a few ($\sim$10) demonstrations per task. Visual results, code, and trained model are provided at https://robotic-view-transformer.github.io/.


One-class Recommendation Systems with the Hinge Pairwise Distance Loss and Orthogonal Representations

Raziperchikolaei, Ramin, Chung, Young-joo

arXiv.org Artificial Intelligence

In one-class recommendation systems, the goal is to learn a model from a small set of interacted users and items and then identify the positively-related user-item pairs among a large number of pairs with unknown interactions. Most previous loss functions rely on dissimilar pairs of users and items, which are selected from the ones with unknown interactions, to obtain better prediction performance. This strategy introduces several challenges such as increasing training time and hurting the performance by picking "similar pairs with the unknown interactions" as dissimilar pairs. In this paper, the goal is to only use the similar set to train the models. We point out three trivial solutions that the models converge to when they are trained only on similar pairs: collapsed, partially collapsed, and shrinking solutions. We propose two terms that can be added to the objective functions in the literature to avoid these solutions. The first one is a hinge pairwise distance loss that avoids the shrinking and collapsed solutions by keeping the average pairwise distance of all the representations greater than a margin. The second one is an orthogonality term that minimizes the correlation between the dimensions of the representations and avoids the partially collapsed solution. We conduct experiments on a variety of tasks on public and real-world datasets. The results show that our approach using only similar pairs outperforms state-of-the-art methods using similar pairs and a large number of dissimilar pairs.


Artificial Intelligence Detects New Family of Genes in Gut Bacteria

#artificialintelligence

Using artificial intelligence, UT Southwestern researchers have discovered a new family of sensing genes in enteric bacteria that are linked by structure and probably function, but not genetic sequence. The findings, published in PNAS, offer a new way of identifying the role of genes in unrelated species and could lead to new ways to fight intestinal bacterial infections. "We identified similarities in these proteins in reverse of how it's usually done. Instead of using sequence, Lisa looked for matches in their structure," said Kim Orth, Ph.D., Professor of Molecular Biology and Biochemistry, who co-led the study with Lisa Kinch, Ph.D., a bioinformatics specialist in the Department of Molecular Biology. Dr. Orth's lab has long focused on studying how marine and estuary bacteria cause infections.


Existence, Stability and Scalability of Orthogonal Convolutional Neural Networks

Achour, El Mehdi, Malgouyres, François, Mamalet, Franck

arXiv.org Artificial Intelligence

Imposing orthogonality on the layers of neural networks is known to facilitate the learning by limiting the exploding/vanishing of the gradient; decorrelate the features; improve the robustness. This paper studies theoretical properties of orthogonal convolutional layers. We establish necessary and sufficient conditions on the layer architecture guaranteeing the existence of an orthogonal convolutional transform. The conditions prove that orthogonal convolutional transforms exist for almost all architectures used in practice for 'circular' padding.We also exhibit limitations with 'valid' boundary condition and 'same' boundary condition with zero padding. Recently, a regularization term imposing the orthogonality of convolutional layers has been proposed, and impressive empirical results have been obtained in different applications (Wang et al. 2020).The second motivation of the present paper is to specify the theory behind this.We make the link between this regularization term and orthogonality measures. In doing so, we show that this regularization strategy is stable with respect to numerical and optimization errors and that, in the presence of small errors and when the size of the signal/image is large, the convolutional layers remain close to isometric.The theoretical results are confirmed with experiments, the landscape of the regularization term is studied and the regularization strategy is validated on real datasets. Altogether, the study guarantees that the regularization with L_{orth} (Wang et al. 2020) is an efficient, flexible and stable numerical strategy to learn orthogonal convolutional layers.


Innovative New Algorithms Advance the Computing Power of Early-Stage Quantum Computers

#artificialintelligence

A group of scientists at the U.S. Department of Energy's Ames Laboratory has developed computational quantum algorithms that are capable of efficient and highly accurate simulations of static and dynamic properties of quantum systems. The algorithms are valuable tools to gain greater insight into the physics and chemistry of complex materials, and they are specifically designed to work on existing and near-future quantum computers. Scientist Yong-Xin Yao and his research partners at Ames Lab use the power of advanced computers to speed discovery in condensed matter physics, modeling incredibly complex quantum mechanics and how they change over ultra-fast timescales. Current high performance computers can model the properties of very simple, small quantum systems, but larger or more complex systems rapidly expand the number of calculations a computer must perform to arrive at an accurate model, slowing the pace not only of computation, but also discovery. "This is a real challenge given the current early-stage of existing quantum computing capabilities," said Yao, "but it is also a very promising opportunity, since these calculations overwhelm classical computer systems, or take far too long to provide timely answers."