AITopics

2605.12908

Country: Asia (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Linkerhägner, Jonas, Bortolasi, Michele, Baldassari, Lorenzo, de Hoop, Maarten V., Dokmanić, Ivan

On Observation Time for Recovering Latent Hawkes Networks

arXiv.org Machine LearningMay-12-2026

Dynamics of interacting systems in engineering, society, and nature often evolve over latent networks that govern which entities can interact. We study the problem of inferring these networks from event-based observations, which arise naturally in finance, seismology, and neuroscience. While there is substantial algorithmic work addressing this important problem, theoretical results are scarce. In this paper we ask the following fundamental question: what is the minimum time that one must observe the dynamics in order to exactly recover the underlying network, as a function of the number $d$ of interacting entities? For a class of stationary Hawkes processes with sparse, weak interactions, we prove that an observation time of order $\log d$ is sufficient and necessary. For the upper bound we construct a two-stage estimator that uses clipped and binned event data for screening, followed by a least-squares refinement, and apply concentration bounds derived from the Poisson cluster representation. For the lower bound we combine Fano's inequality with Jacod's Girsanov formula for point processes on a suitable subclass of networks.

artificial intelligence, bayesian inference, machine learning, (20 more...)

2605.084

Country: North America > United States (0.92)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)
Health & Medicine > Therapeutic Area > Immunology (0.46)
Health & Medicine > Therapeutic Area > Neurology (0.34)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.45)

Zhou, Yichen, Tu, Stephen

CLT-Optimal Parameter Error Bounds for Linear System Identification

arXiv.org Machine LearningApr-24-2026

There has been remarkable progress over the past decade in establishing finite-sample, non-asymptotic bounds on recovering unknown system parameters from observed system behavior. Surprisingly, however, we show that the current state-of-the-art bounds do not accurately capture the statistical complexity of system identification, even in the most fundamental setting of estimating a discrete-time linear dynamical system (LDS) via ordinary least-squares regression (OLS). Specifically, we utilize asymptotic normality to identify classes of problem instances for which current bounds overstate the squared parameter error, in both spectral and Frobenius norm, by a factor of the state-dimension of the system. Informed by this discrepancy, we then sharpen the OLS parameter error bounds via a novel second-order decomposition of the parameter error, where crucially the lower-order term is a matrix-valued martingale that we show correctly captures the CLT scaling. From our analysis we obtain finite-sample bounds for both (i) stable systems and (ii) the many-trajectories setting that match the instance-specific optimal rates up to constant factors in Frobenius norm, and polylogarithmic state-dimension factors in spectral norm.

artificial intelligence, assumption 2, machine learning, (18 more...)

2604.2127

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

arXiv.org Machine LearningMar-30-2026

Sharp Capacity Scaling of Spectral Optimizers in Learning Associative Memory

Kim, Juno, Nichani, Eshaan, Wu, Denny, Bietti, Alberto, Lee, Jason D.

Spectral optimizers such as Muon have recently shown strong empirical performance in large-scale language model training, but the source and extent of their advantage remain poorly understood. We study this question through the linear associative memory problem, a tractable model for factual recall in transformer-based models. In particular, we go beyond orthogonal embeddings and consider Gaussian inputs and outputs, which allows the number of stored associations to greatly exceed the embedding dimension. Our main result sharply characterizes the recovery rates of one step of Muon and SGD on the logistic regression loss under a power law frequency distribution. We show that the storage capacity of Muon significantly exceeds that of SGD, and moreover Muon saturates at a larger critical batch size. We further analyze the multi-step dynamics under a thresholded gradient approximation and show that Muon achieves a substantially faster initial recovery rate than SGD, while both methods eventually converge to the information-theoretic limit at comparable speeds. Experiments on synthetic tasks validate the predicted scaling laws. Our analysis provides a quantitative understanding of the signal amplification of Muon and lays the groundwork for establishing scaling laws across more practical language modeling tasks and optimizers.

logd, machine learning, natural language, (21 more...)

2603.26554

Country:

Europe > France (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
North America > United States > District of Columbia > Washington (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Neural Information Processing SystemsFeb-12-2026, 15:21:56 GMT

Communication-efficientDistributedSGDwith Sketching

However,theoretical and empirical evidence both suggest that there is a maximum mini-batch size beyond which the number of iterations required toconvergestops decreasing, andgeneralization error begins toincrease [Maetal.,2017,Lietal., 2014, Golmant et al., 2018, Shallue et al., 2018, Keskar et al., 2016, Hoffer et al., 2017]. In this paper, we aim instead to decrease the communication cost per worker.

artificial intelligence, deep learning, machine learning, (15 more...)

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.05)
Europe > Sweden > Stockholm > Stockholm (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Christopher Criscitiello, Nicolas Boumal

Efficiently escaping saddle points on manifolds

Neural Information Processing SystemsFeb-12-2026, 14:58:40 GMT

Ifthext, krf(xt)k , perturbxt by , with sampled zero.

artificial intelligence, jinetal, machine learning, (17 more...)

Country:

North America > United States > New Jersey > Mercer County > Princeton (0.05)
Asia > Middle East > Jordan (0.05)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.97)

Neural Information Processing SystemsFeb-11-2026, 02:03:19 GMT

c4bf1e24f3e6f92ca9dfd9a7a1a1049c-Paper.pdf

We systematically study the query complexity of learning geodesically convex halfspacesongraphs.

artificial intelligence, graph, machine learning, (18 more...)

Country:

Europe > Austria > Vienna (0.14)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsFeb-10-2026, 22:28:44 GMT

97b983c974551153d20ddfabb62a5203-Paper-Conference.pdf

For a range of activation functions, including ReLUs, we establish superpolynomial Statistical Query (SQ) lower bounds for this learning problem.

artificial intelligence, halfspace, machine learning, (17 more...)

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsFeb-10-2026, 20:11:35 GMT

b534ba68236ba543ae44b22bd110a1d6-Paper.pdf

Generative adversarial networks (GANs) typically require ample data for training in order to synthesize high-fidelity images.

artificial intelligence, augmentation, machine learning, (19 more...)

Country: Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.34)

Neural Information Processing SystemsFeb-9-2026, 01:00:07 GMT

63dc7ed1010d3c3b8269faf0ba7491d4-Supplemental.pdf

In this document, we provide details and supplementary materials that cannot fit into the main manuscript due to the page limit. The specific form ofcenter distribution isunknown, but we can still train a generatorG to approximate it. IfR(G,D,T)),wechooseλ=0, i.e., no restriction onR(G,D,T)), to obtain the minimal cost. IfR(G,D,T)) >, then a large λshould be applied as apenalization. According to the derivation of Eq. (3), we obtain arelaxed versionoftheintractableEq.(2),expressedasfollows: min Inknowledge distillation, student models arecrafted using unlabeled datasets, where only thesoft targets from teachers are utilized.

artificial intelligence, machine learning, stride, (18 more...)

Industry: Education (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.36)