AITopics | orth

Collaborating Authors

orth

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Move on Muon : A Hamiltonian probability gradient flow perspective of Muon optimizer

Mustafi, Aratrika, Mukherjee, Soumya, Sriperumbudur, Bharath K.

arXiv.org Machine LearningMay-25-2026

We develop a gradient flow on the space of probability measures defined on matrix-valued parameters induced by regularized Muon, an analytically smoothed version of the idealized Muon optimizer. The key observation is that the regularized orthogonalization map is the gradient of a smooth Fenchel-dual smoothing of the nuclear norm. This identifies the (regularized) Muon update as a mirror/prox step in the update variable, with momentum acting as the dual coordinate. We use this structure to lift Muon from a single matrix parameter to finite-particle probability objectives of the form $J(ρ)=R\left(\int F d ρ\right)$, a setting motivated by mean-field descriptions of neural-network training, and derive the inertial continuous-time limit. Using this structure, we derive the finite-particle continuous-time limit under the inertial scaling of step size and momentum, and then pass to a phase-space mean-field equation over probability laws on parameter-momentum pairs. The resulting flow can be shown to be a damped Hamiltonian probability dynamics whose kinetic energy is induced by the regularized Muon mirror potential. We prove an exact Hamiltonian dissipation identity, showing that the Hamiltonian energy decreases monotonically. While the target objective itself need not be monotone along the inertial Muon dynamics, under additional gradient-dominance, bounded-momentum, and curvature/alignment assumptions, we obtain continuous and discrete-time exponential convergence rates for the objective gap. We also study the well-posedness of the mean-field limit equation and establish propagation of chaos guarantees for the interacting particle system. Finally, we extend the formulation to Hilbert-valued feature maps on product matrix spaces, yielding a blockwise Muon probability flow applicable to smooth transformer mixture-of-experts models.

artificial intelligence, assumption, machine learning, (18 more...)

arXiv.org Machine Learning

2605.23871

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

MuonEq: Balancing Before Orthogonalization with Lightweight Equilibration

Chang, Da, Shi, Qiankun, Zhang, Lvgang, Li, Yu, Zhang, Ruijie, Lu, Yao, Liu, Yongxiang, Yuan, Ganzhao

arXiv.org Machine LearningMar-31-2026

Orthogonalized-update optimizers such as Muon improve training of matrix-valued parameters, but existing extensions mostly act either after orthogonalization by rescaling updates or before it with heavier whitening-based preconditioners. We introduce {\method}, a lightweight family of pre-orthogonalization equilibration schemes for Muon in three forms: two-sided row/column normalization (RC), row normalization (R), and column normalization (C). These variants rebalance the momentum matrix before finite-step Newton--Schulz using row/column squared-norm statistics and only $\mathcal{O}(m+n)$ auxiliary state. We show that finite-step orthogonalization is governed by input spectral properties, especially stable rank and condition number, and that row/column normalization is a zeroth-order whitening surrogate that removes marginal scale mismatch. For the hidden matrix weights targeted by {\method}, the row-normalized variant R is the natural default and preserves the $\widetilde{\mathcal{O}}(T^{-1/4})$ stationarity guarantee of Muon-type methods. In LLaMA2 pretraining on C4, the default R variant consistently outperforms Muon on 130M and 350M models, yielding faster convergence and lower validation perplexity.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Machine Learning

2603.28254

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Basis-Oriented Low-rank Transfer for Few-Shot and Test-Time Adaptation

Park, Junghwan, Cho, Woojin, Heo, Junhyuk, Kwon, Darongsae, Lee, Kookjin

arXiv.org Artificial IntelligenceDec-3-2025

Adapting large pre-trained models to unseen tasks under tight data and compute budgets remains challenging. Meta-learning approaches explicitly learn good initializations, but they require an additional meta-training phase over many tasks, incur high training cost, and can be unstable. At the same time, the number of task-specific pre-trained models continues to grow, yet the question of how to transfer them to new tasks with minimal additional training remains relatively underexplored. We propose BOLT (Basis-Oriented Low-rank Transfer), a framework that reuses existing fine-tuned models not by merging weights, but instead by extracting an orthogonal, task-informed spectral basis and adapting within that subspace. In the offline phase, BOLT collects dominant singular directions from multiple task vectors and orthogonalizes them per layer to form reusable bases. In the online phase, we freeze these bases and train only a small set of diagonal coefficients per layer for the new task, yielding a rank-controlled update with very few trainable parameters. This design provides (i) a strong, training-free initialization for unseen tasks, obtained by pooling source-task coefficients, along with a lightweight rescaling step while leveraging the shared orthogonal bases, and (ii) a parameter-efficient fine-tuning (PEFT) path that, in our experiments, achieves robust performance compared to common PEFT baselines as well as a representative meta-learned initialization. Our results show that constraining adaptation to a task-informed orthogonal subspace provides an effective alternative for unseen-task transfer.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2512.02441

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

1ea97de85eb634d580161c603422437f-Supplemental.pdf

Neural Information Processing SystemsOct-2-2025, 09:44:07 GMT

Supplementary material: Hold me tight! A Theoretical margin distribution of a linear classifier 2 B Examples of frequency "flipped" images 4 C Invariance and elasticity on MNIST data 4 D Connections to catastrophic forgetting 5 E Examples of filtered images 6 F Subspace sampling of the DCT 6 G Training parameters 7 H Cross-dataset performance 8 I Margin distribution for standard networks 9 J Adversarial training parameters 13 K Description of L2-PGD attack on frequency "flipped" data 14 L Spectral decomposition on frequency "flipped" data 15 M Margin distribution for adversarially trained networks 16 N Margin distribution on random subspaces 19 We demonstrate this effect in practice by repeating the experiment of Sec. MLP we use a simple logistic regression (see Table S1).Clearly, although the values along Figure S1 shows a few example images of the frequency "flipped" versions of the standard computer We further validate our observation of Section 3.2.2 that small margin do indeed corresponds to After this, we continue training the network with a linearly decaying learning rate (max. Figure S4: Filtered image examples. Table S2 shows the performance and training parameters of the different networks used in the paper.

artificial intelligence, machine learning, resnet-18, (19 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Fast Summation of Radial Kernels via QMC Slicing

Hertrich, Johannes, Jahn, Tim, Quellmalz, Michael

arXiv.org Machine LearningOct-2-2024

The fast computation of large kernel sums is a challenging task, which arises as a subproblem in any kernel method. We approach the problem by slicing, which relies on random projections to one-dimensional subspaces and fast Fourier summation. We prove bounds for the slicing error and propose a quasi-Monte Carlo (QMC) approach for selecting the projections based on spherical quadrature rules. Numerical examples demonstrate that our QMC-slicing approach significantly outperforms existing methods like (QMC-)random Fourier features, orthogonal Fourier features or non-QMC slicing on standard test datasets.

absolute error absolute error, kernel, sobol, (14 more...)

arXiv.org Machine Learning

2410.01316

Country:

North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Indiana (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Mathematics of Computing (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)

Add feedback

People are obsessed with this weird pizza box. The company behind it won't discuss it

Los Angeles TimesJul-17-2024, 10:00:54 GMT

When Sookie Orth sat down to write her college essay last fall, something quickly came to mind. Orth, then a senior at Sequoyah School in Pasadena, began her draft with a declaration: "I learned how to fold a pizza box at the age of nine." She told the story of her years-long connection with Pizza of Venice in Altadena, where she often dined with her family as a little kid. One day, the manager invited her to assemble a box. Impressed with Orth's speed, the woman told her she could work at the pizzeria when she was older.

orth, pizza box, restaurant depot, (14 more...)

Los Angeles Times

Country:

North America > United States > California (0.51)
North America > United States > Rocky Mountains (0.05)
North America > United States > New York (0.05)
North America > Canada > Rocky Mountains (0.05)

Industry: Consumer Products & Services > Restaurants (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence (0.90)

Add feedback

RVT: Robotic View Transformer for 3D Object Manipulation

Goyal, Ankit, Xu, Jie, Guo, Yijie, Blukis, Valts, Chao, Yu-Wei, Fox, Dieter

arXiv.org Artificial IntelligenceJun-26-2023

For 3D object manipulation, methods that build an explicit 3D representation perform better than those relying only on camera images. But using explicit 3D representations like voxels comes at large computing cost, adversely affecting scalability. In this work, we propose RVT, a multi-view transformer for 3D manipulation that is both scalable and accurate. Some key features of RVT are an attention mechanism to aggregate information across views and re-rendering of the camera input from virtual views around the robot workspace. In simulations, we find that a single RVT model works well across 18 RLBench tasks with 249 task variations, achieving 26% higher relative success than the existing state-of-the-art method (PerAct). It also trains 36X faster than PerAct for achieving the same performance and achieves 2.3X the inference speed of PerAct. Further, RVT can perform a variety of manipulation tasks in the real world with just a few ($\sim$10) demonstrations per task. Visual results, code, and trained model are provided at https://robotic-view-transformer.github.io/.

artificial intelligence, machine learning, orth, (13 more...)

arXiv.org Artificial Intelligence

2306.14896

Genre: Research Report > Promising Solution (0.34)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

One-class Recommendation Systems with the Hinge Pairwise Distance Loss and Orthogonal Representations

Raziperchikolaei, Ramin, Chung, Young-joo

arXiv.org Artificial IntelligenceAug-30-2022

In one-class recommendation systems, the goal is to learn a model from a small set of interacted users and items and then identify the positively-related user-item pairs among a large number of pairs with unknown interactions. Most previous loss functions rely on dissimilar pairs of users and items, which are selected from the ones with unknown interactions, to obtain better prediction performance. This strategy introduces several challenges such as increasing training time and hurting the performance by picking "similar pairs with the unknown interactions" as dissimilar pairs. In this paper, the goal is to only use the similar set to train the models. We point out three trivial solutions that the models converge to when they are trained only on similar pairs: collapsed, partially collapsed, and shrinking solutions. We propose two terms that can be added to the objective functions in the literature to avoid these solutions. The first one is a hinge pairwise distance loss that avoids the shrinking and collapsed solutions by keeping the average pairwise distance of all the representations greater than a margin. The second one is an orthogonality term that minimizes the correlation between the dimensions of the representations and avoids the partially collapsed solution. We conduct experiments on a variety of tasks on public and real-world datasets. The results show that our approach using only similar pairs outperforms state-of-the-art methods using similar pairs and a large number of dissimilar pairs.

objective function, proceedings, representation, (16 more...)

arXiv.org Artificial Intelligence

2208.14594

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.88)

Add feedback

Artificial Intelligence Detects New Family of Genes in Gut Bacteria

#artificialintelligenceJul-4-2022, 16:20:17 GMT

Using artificial intelligence, UT Southwestern researchers have discovered a new family of sensing genes in enteric bacteria that are linked by structure and probably function, but not genetic sequence. The findings, published in PNAS, offer a new way of identifying the role of genes in unrelated species and could lead to new ways to fight intestinal bacterial infections. "We identified similarities in these proteins in reverse of how it's usually done. Instead of using sequence, Lisa looked for matches in their structure," said Kim Orth, Ph.D., Professor of Molecular Biology and Biochemistry, who co-led the study with Lisa Kinch, Ph.D., a bioinformatics specialist in the Department of Molecular Biology. Dr. Orth's lab has long focused on studying how marine and estuary bacteria cause infections.

artificial intelligence detect new family, bacteria, protein, (12 more...)

#artificialintelligence

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.97)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback