AITopics | orthogonal

Collaborating Authors

orthogonal

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Robustly Learning Monotone Single-Index Models

Neural Information Processing SystemsJun-17-2026, 18:45:04 GMT

We consider the basic problem of learning Single-Index Models with respect to the square loss under the Gaussian distribution in the presence of adversarial label noise. Our main contribution is the first computationally efficient algorithm for this learning task, achieving a constant factor approximation, that succeeds for the class of all monotone activations with bounded moment of order 2 + ζ, for ζ > 0. This class in particular includes all monotone Lipschitz functions and even discontinuous functions like (possibly biased) halfspaces. Prior work for the case of unknown activation either does not attain constant factor approximation or succeeds for a substantially smaller family of activations. The main conceptual novelty of our approach lies in developing an optimization framework that steps outside the boundaries of usual gradient methods and instead identifies a useful vector field to guide the algorithm updates by directly leveraging the problem structure, properties of Gaussian spaces, and regularity of monotone functions.

artificial intelligence, machine learning, optimization problem, (20 more...)

Neural Information Processing Systems

Country: North America > United States (0.45)

Genre: Research Report > Experimental Study (1.00)

Industry:

Government > Military (0.46)
Government > Regional Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.87)

Add feedback

Conservation Laws from Data Symmetry in Neural Networks

Galley, Jakob, Shahverdi, Vahid, Flinth, Axel

arXiv.org Machine LearningJun-10-2026

We explore whether intrinsic symmetries of the training data lead to conserved quantities during gradient-flow training of neural networks. Under the assumption that the loss function is analytic and non-polynomial, we prove that data symmetries generically do not induce any additional integrals of motion. For mean squared error (MSE) loss, on the other hand, there are situations in which data augmentation yields extra conserved quantities. We build a framework, utilizing tensorizable networks to describe this phenomenon. Tensorizable networks are a family of architectures whose dependence on parameters and inputs can be separated using an intermediate representation. They include linear and Figure 1: A display of how data symmetry can give polynomial networks, as well as Lightning At-rise to conservation laws. The top row shows the tention.

artificial intelligence, machine learning, symmetry, (18 more...)

arXiv.org Machine Learning

2606.10913

Country: Europe (0.28)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

How does feature learning reshape the function space?

Lobo, João, Loureiro, Bruno, Tran-Than, Long, Liu, Fanghui

arXiv.org Machine LearningMay-19-2026

Feature learning is widely regarded as the key mechanism distinguishing neural networks from fixed-kernel methods, yet its impact on the induced function space remains poorly understood. In this work, we precisely characterize how the function space spanned by the features of a two-layer neural network evolves during gradient descent training. We prove that, in the high-dimensional proportional regime, after a large gradient step the post-update feature distribution is well approximated by a target-dependent spiked Gaussian covariance. This induces a data-adaptive kernel that reshapes the function space and modifies its spectral structure. Our analysis reveals that feature learning can be interpreted as a distributional transformation in either parameter space or input space, equivalently as the introduction of a target-dependent kernel. In particular, it selectively amplifies eigenvalues aligned with the target direction and mixes leading eigenfunctions, coupling the top radial mode with a target-aligned quadratic harmonic. Overall, our results provide a precise function-space perspective on early-stage feature learning: rather than just rescaling a fixed kernel, gradient descent induces a data-adaptive deformation that preferentially enhances directions aligned with the signal in the data.

artificial intelligence, kernel, machine learning, (18 more...)

arXiv.org Machine Learning

2605.17718

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.68)

Add feedback

477bdb55b231264bb53a7942fd84254d-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 16:55:34 GMT

artificial intelligence, machine learning, ridge estimate, (15 more...)

Neural Information Processing Systems

Genre: Research Report (0.94)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.69)

Add feedback

Adversarial Examples Exist in Two-Layer ReLU Networks for Low Dimensional Linear Subspaces

Neural Information Processing SystemsApr-24-2026, 22:46:22 GMT

Despite a great deal of research, it is still not well-understood why trained neural networks are highly vulnerable to adversarial examples. In this work we focus on two-layer neural networks trained using data which lie on a low dimensional linear subspace. We show that standard gradient methods lead to non-robust neural networks, namely, networks which have large gradients in directions orthogonal to the data subspace, and are susceptible to small adversarial L2-perturbations in these directions. Moreover, we show that decreasing the initialization scale of the training algorithm, or adding L2 regularization, can make the trained network more robust to adversarial perturbations orthogonal to the data.

artificial intelligence, machine learning, perturbation, (20 more...)

Neural Information Processing Systems

Country: Asia (0.28)

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Group and Shuffle: Efficient Structured Orthogonal Parametrization

Neural Information Processing SystemsMar-21-2026, 07:25:28 GMT

The increasing size of neural networks has led to a growing demand for methods of efficient finetuning. Recently, an orthogonal finetuning paradigm was introduced that uses orthogonal matrices for adapting the weights of a pretrained model. In this paper, we introduce a new class of structured matrices, which unifies and generalizes structured classes from previous works. We examine properties of this class and build a structured orthogonal parametrization upon it. We then use this parametrization to modify the orthogonal finetuning framework, improving parameter efficiency.

artificial intelligence, machine learning, proceedings, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.32)

Add feedback

Benign overfitting in leaky ReLU networks with moderate input dimension

Neural Information Processing SystemsMar-19-2026, 21:08:49 GMT

The problem of benign overfitting asks whether it is possible for a model to perfectly fit noisy training data and still generalize well. We study benign overfitting in two-layer leaky ReLU networks trained with the hinge loss on a binary classification task. We consider input data which can be decomposed into the sum of a common signal and a random noise component, which lie on subspaces orthogonal to one another. We characterize conditions on the signal to noise ratio (SNR) of the model parameters giving rise to benign versus non-benign, or harmful, overfitting: in particular, if the SNR is high then benign overfitting occurs, conversely if the SNR is low then harmful overfitting occurs. We attribute both benign and non-benign overfitting to an approximate margin maximization property and show that leaky ReLU networks trained on hinge loss with gradient descent (GD) satisfy this property. In contrast to prior work we do not require the training data to be nearly orthogonal. Notably, for input dimension $d$ and training sample size $n$, while results in prior work require $d = \Omega(n^2 \log n)$, here we require only $d = \Omega(n)$.

artificial intelligence, machine learning, proceedings, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback