AITopics | dynamic

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Neural Information Processing SystemsOct-8-2025, 16:30:48 GMT

The Simplicity Bias in Multi-Task RNNs: Shared Attractors, Reuse of Dynamics, and Geometric Representation

The forces shaping joint dynamics of multiple tasks, however, are largely unexplored. In this work, we first construct a systematic framework to study multiple tasks in RNNs, minimizing interference from input and output correlations with the hidden representation.

artificial intelligence, attractor, machine learning, (16 more...)

Country: Asia > Middle East > Israel (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.30)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.49)

Neural Information Processing SystemsAug-18-2025, 18:41:57 GMT

Local Linear Convergence of Gradient Methods for Subspace Optimization via Strict Complementarity

In this work we bridge these two approaches under a strict complementarity assumption, which in particular implies that the optimal solution to the convex relaxation is unique and is also the optimal solution to the original nonconvex problem.

artificial intelligence, machine learning, optimization problem, (16 more...)

Country:

North America > United States (0.28)
Asia > Middle East > Israel > Haifa District > Haifa (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Neural Information Processing SystemsMay-26-2025, 19:49:13 GMT

Bias in Motion: Theoretical Insights into the Dynamics of Bias in SGD Training

Machine learning systems often acquire biases by leveraging undesired features in the data, impacting accuracy variably across different sub-populations of the data. However, our current understanding of bias formation mostly focuses on the initial and final stages of learning, leaving a gap in knowledge regarding the transient dynamics. To address this gap, this paper explores the evolution of bias in a teacher-student setup that models different data sub-populations with a Gaussian-mixture model. We provide an analytical description of the stochastic gradient descent dynamics of a linear classifier in this setup, which we prove to be exact in high dimension.Notably, our analysis identifies different properties of the sub-populations that drive bias at different timescales and hence shows a shifting preference of our classifier during training. By applying our general solution to fairness and robustness, we delineate how and when heterogeneous data and spurious features can generate and amplify bias.

artificial intelligence, machine learning, theoretical insight, (5 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.63)

arXiv.org Machine LearningMar-5-2025

An Analytical Theory of Power Law Spectral Bias in the Learning Dynamics of Diffusion Models

Wang, Binxu

We developed an analytical framework for understanding how the learned distribution evolves during diffusion model training. Leveraging the Gaussian equivalence principle, we derived exact solutions for the gradient-flow dynamics of weights in one- or two-layer linear denoiser settings with arbitrary data. Remarkably, these solutions allowed us to derive the generated distribution in closed form and its KL divergence through training. These analytical results expose a pronounced power-law spectral bias, i.e., for weights and distributions, the convergence time of a mode follows an inverse power law of its variance. Empirical experiments on both Gaussian and image datasets demonstrate that the power-law spectral bias remains robust even when using deeper or convolutional architectures. Our results underscore the importance of the data covariance in dictating the order and rate at which diffusion models learn different modes of the data, providing potential explanations for why earlier stopping could lead to incorrect details in image generative models.

artificial intelligence, machine learning, spectral bias, (17 more...)

arXiv.org Machine Learning

2503.03206

Country:

Europe > Latvia > Lubāna Municipality > Lubāna (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine (0.92)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Artificial IntelligenceFeb-5-2025

Pioneer: Physics-informed Riemannian Graph ODE for Entropy-increasing Dynamics

Sun, Li, Zhang, Ziheng, Wang, Zixi, Wang, Yujie, Wan, Qiqi, Li, Hao, Peng, Hao, Yu, Philip S.

Dynamic interacting system modeling is important for understanding and simulating real world systems. The system is typically described as a graph, where multiple objects dynamically interact with each other and evolve over time. In recent years, graph Ordinary Differential Equations (ODE) receive increasing research attentions. While achieving encouraging results, existing solutions prioritize the traditional Euclidean space, and neglect the intrinsic geometry of the system and physics laws, e.g., the principle of entropy increasing. The limitations above motivate us to rethink the system dynamics from a fresh perspective of Riemannian geometry, and pose a more realistic problem of physics-informed dynamic system modeling, considering the underlying geometry and physics law for the first time. In this paper, we present a novel physics-informed Riemannian graph ODE for a wide range of entropy-increasing dynamic systems (termed as Pioneer). In particular, we formulate a differential system on the Riemannian manifold, where a manifold-valued graph ODE is governed by the proposed constrained Ricci flow, and a manifold preserving Gyro-transform aware of system geometry. Theoretically, we report the provable entropy non-decreasing of our formulation, obeying the physics laws. Empirical results show the superiority of Pioneer on real datasets.

artificial intelligence, machine learning, proceedings, (18 more...)

2502.03236

Country:

Asia > China > Beijing > Beijing (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > Greenland (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Carroll, Liam, Hoogland, Jesse, Farrugia-Roberts, Matthew, Murfet, Daniel

Dynamics of Transient Structure in In-Context Linear Regression Transformers

arXiv.org Artificial IntelligenceJan-31-2025

Modern deep neural networks display striking examples of rich internal computational structure. Uncovering principles governing the development of such structure is a priority for the science of deep learning. In this paper, we explore the transient ridge phenomenon: when transformers are trained on in-context linear regression tasks with intermediate task diversity, they initially behave like ridge regression before specializing to the tasks in their training distribution. This transition from a general solution to a specialized solution is revealed by joint trajectory principal component analysis. Further, we draw on the theory of Bayesian internal model selection to suggest a general explanation for the phenomena of transient structure in transformers, based on an evolving tradeoff between loss and complexity. We empirically validate this explanation by measuring the model complexity of our transformers as defined by the local learning coefficient.

artificial intelligence, machine learning, transformer, (15 more...)

2501.17745

Country:

North America > United States > Massachusetts > Middlesex County > Reading (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Latvia > Lubāna Municipality > Lubāna (0.04)

Genre: Research Report (0.81)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.86)

Castin, Valérie, Ablin, Pierre, Carrillo, José Antonio, Peyré, Gabriel

A Unified Perspective on the Dynamics of Deep Transformers

arXiv.org Artificial IntelligenceJan-30-2025

Transformers, which are state-of-the-art in most machine learning tasks, represent the data as sequences of vectors called tokens. This representation is then exploited by the attention function, which learns dependencies between tokens and is key to the success of Transformers. However, the iterative application of attention across layers induces complex dynamics that remain to be fully understood. To analyze these dynamics, we identify each input sequence with a probability measure and model its evolution as a Vlasov equation called Transformer PDE, whose velocity field is non-linear in the probability measure. Our first set of contributions focuses on compactly supported initial data. We show the Transformer PDE is well-posed and is the mean-field limit of an interacting particle system, thus generalizing and extending previous analysis to several variants of self-attention: multi-head attention, L2 attention, Sinkhorn attention, Sigmoid attention, and masked attention--leveraging a conditional Wasserstein framework. In a second set of contributions, we are the first to study non-compactly supported initial conditions, by focusing on Gaussian initial data. Again for different types of attention, we show that the Transformer PDE preserves the space of Gaussian measures, which allows us to analyze the Gaussian case theoretically and numerically to identify typical behaviors. This Gaussian analysis captures the evolution of data anisotropy through a deep Transformer. In particular, we highlight a clustering phenomenon that parallels previous results in the non-normalized discrete case.

artificial intelligence, deep learning, machine learning, (17 more...)

2501.18322

Country:

Europe > France (0.46)
Europe > United Kingdom (0.45)

Genre: Research Report (0.50)

Industry: Energy > Oil & Gas > Upstream (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.45)

Pustina, Pietro, Della Santina, Cosimo, De Luca, Alessandro

Unified Inverse Dynamics of Modular Serial Mechanical Systems with Application to Soft Robotics

arXiv.org Artificial IntelligenceFeb-10-2024

The robotic field has been witnessing a progressive departure from classic robotic systems composed of serial/stiff links interconnected by simple rigid joints. Novel robotic concepts, e.g., soft robots, often maintain a series-like structure, but their mechanical modules exhibit complex and unconventional articulation patterns. Research in efficient recursive formulations of the dynamic models for subclasses of these systems has been extremely active in the past decade. Yet, as of today, no single recursive inverse dynamics algorithm can describe the behavior of all these systems. This paper addresses this challenge by proposing a new iterative formulation based on Kane equations. Its computational complexity is optimal, i.e., linear with the number of modules. While the proposed formulation is not claimed to be necessarily more efficient than state-of-the-art techniques for specific subclasses of robots, we illustrate its usefulness in the modeling of different complex systems. We propose two new models of soft robots: (i) a class of pneumatically actuated soft arms that deform along their cross-sectional area, and (ii) a piecewise strain model with Gaussian functions.

artificial intelligence, equation, robot, (18 more...)

2402.07037

Country:

Europe > Netherlands (0.14)
Europe > Italy (0.14)

Genre: Research Report (0.70)

Industry:

Energy > Oil & Gas > Upstream (0.46)
Construction & Engineering (0.41)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)