AITopics

2605.05024

Country: North America > United States (1.00)

Genre: Research Report (0.50)

Industry: Government > Regional Government > North America Government > United States Government (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.92)
Information Technology > Data Science > Data Mining (0.67)

Bressan, Marco, Brukhim, Nataly, Cesa-Bianchi, Nicolo, Esposito, Emmanuel, Mansour, Yishay, Moran, Shay, Thiessen, Maximilian

Learning Conditional Averages

arXiv.org Machine LearningFeb-13-2026

We introduce the problem of learning conditional averages in the PAC framework. The learner receives a sample labeled by an unknown target concept from a known concept class, as in standard PAC learning. However, instead of learning the target concept itself, the goal is to predict, for each instance, the average label over its neighborhood -- an arbitrary subset of points that contains the instance. In the degenerate case where all neighborhoods are singletons, the problem reduces exactly to classic PAC learning. More generally, it extends PAC learning to a setting that captures learning tasks arising in several domains, including explainability, fairness, and recommendation systems. Our main contribution is a complete characterization of when conditional averages are learnable, together with sample complexity bounds that are tight up to logarithmic factors. The characterization hinges on the joint finiteness of two novel combinatorial parameters, which depend on both the concept class and the neighborhood system, and are closely related to the independence number of the associated neighborhood graph.

artificial intelligence, machine learning, neighborhood, (16 more...)

2602.1192

Country:

North America > United States (0.04)
Europe > Italy > Lombardy > Milan (0.04)
Europe > Italy > Marche > Ancona Province > Ancona (0.04)
(3 more...)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (1.00)

Neural Information Processing SystemsFeb-10-2026, 23:56:21 GMT

ecb287ff763c169694f682af52c1f309-Supplemental.pdf

Figure 14: (Larger Version) Nearest Neighbors for each concept for ACE-SP obtained in the toy example.

artificial intelligence, neighbor, pxq, (14 more...)

Technology: Information Technology > Artificial Intelligence (0.71)

Neural Information Processing SystemsFeb-10-2026, 04:26:55 GMT

c336346c777707e09cab2a3c79174d90-Supplemental.pdf

We also establish new convergence complexities to achieve an approximate KKT solution when the objective can be smooth/nonsmooth, deterministic/stochastic and convex/nonconvex with complexity that is on a par with gradient descent for unconstrained optimization problems in respective cases. To the best of our knowledge, this is the first study of the first-order methods with complexity guarantee for nonconvex sparse-constrained problems.

artificial intelligence, machine learning, xk 1, (17 more...)

Country:

Asia > China > Shanghai > Shanghai (0.04)
North America > United States > Texas > Dallas County > Dallas (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Neural Information Processing SystemsFeb-9-2026, 19:58:06 GMT

7412b6288499d68769430839b98ff1c1-Supplemental-Conference.pdf

Thus, we can integrate overY and get theη pXq.

artificial intelligence, log 2spnq, machine learning, (16 more...)

Country: Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Neural Information Processing SystemsFeb-7-2026, 10:44:52 GMT

MaximumLikelihoodTrainingof Score-BasedDiffusionModels

Both families ofmodels perturb data with a sequence of noise distributions, and generate samples by learning to reverse this path from noise todata.

artificial intelligence, machine learning, pxq, (16 more...)

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > France (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Neural Information Processing SystemsFeb-7-2026, 10:44:48 GMT

MaximumLikelihoodTrainingof Score-BasedDiffusionModels

Both families ofmodels perturb data with a sequence of noise distributions, and generate samples by learning to reverse this path from noise todata.

artificial intelligence, likelihood, machine learning, (16 more...)

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > France (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

arXiv.org Machine LearningFeb-6-2026

Muon in Associative Memory Learning: Training Dynamics and Scaling Laws

Li, Binghui, Wang, Kaifei, Zhong, Han, Lu, Pinyan, Wang, Liwei

Muon updates matrix parameters via the matrix sign of the gradient and has shown strong empirical gains, yet its dynamics and scaling behavior remain unclear in theory. We study Muon in a linear associative memory model with softmax retrieval and a hierarchical frequency spectrum over query-answer pairs, with and without label noise. In this setting, we show that Gradient Descent (GD) learns frequency components at highly imbalanced rates, leading to slow convergence bottlenecked by low-frequency components. In contrast, the Muon optimizer mitigates this imbalance, leading to faster and more uniform progress. Specifically, in the noiseless case, Muon achieves an exponential speedup over GD; in the noisy case with a power-decay frequency spectrum, we derive Muon's optimization scaling law and demonstrate its superior scaling efficiency over GD. Furthermore, we show that Muon can be interpreted as an implicit matrix preconditioner arising from adaptive task alignment and block-symmetric gradient structure. In contrast, the preconditioner with coordinate-wise sign operator could match Muon under oracle access to unknown task representations, which is infeasible for SignGD in practice. Experiments on synthetic long-tail classification and LLaMA-style pre-training corroborate the theory.

artificial intelligence, machine learning, natural language, (18 more...)

2602.05725

Country:

Asia > Middle East > Jordan (0.04)
Europe > France (0.04)
Europe > Switzerland (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
(2 more...)

Cheng, Bing, Tong, Howell

An approach to Fisher-Rao metric for infinite dimensional non-parametric information geometry

arXiv.org Machine LearningJan-8-2026

Being infinite dimensional, non-parametric information geometry has long faced an "intractability barrier" due to the fact that the Fisher-Rao metric is now a functional incurring difficulties in defining its inverse. This paper introduces a novel framework to resolve the intractability with an Orthogonal Decomposition of the Tangent Space ($T_fM = S \oplus S^{\perp}$), where $S$ represents an observable covariate subspace. Through the decomposition, we derive the Covariate Fisher Information Matrix (cFIM), denoted as ${\bf G}_f$, which is a finite-dimensional and computable representative of information extractable from the manifold's geometry. Significantly, by proving the Trace Theorem: $H_G(f) = \text{Tr}({\bf G}_f)$, we establish a rigorous foundation for the G-entropy previously introduced by us, thereby identifying it as a fundamental geometric invariant representing the total explainable statistical information captured by the probability distribution associated with a model. Furthermore, we establish a link between ${\bf G}_f$ and the second derivative (i.e. the curvature) of the KL-divergence, leading to the notion of Covariate Cramér-Rao Lower Bound(CRLB). We demonstrate that ${\bf G}_f$ is congruent to the Efficient Fisher Information Matrix, thereby providing fundamental limits of variance for semi-parametric estimators. Finally, we apply our geometric framework to the Manifold Hypothesis, lifting the latter from a heuristic assumption into a testable condition of rank-deficiency within the cFIM. By defining the Information Capture Ratio, we provide a rigorous method for estimating intrinsic dimensionality in high-dimensional data. In short, our work bridges the gap between abstract information geometry and the demand of explainable AI, by providing a tractable path for assessing the statistical coverage and the efficiency of non-parametric models.

artificial intelligence, information, machine learning, (15 more...)

2512.21451

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

arXiv.org Artificial IntelligenceDec-11-2025

Ineffectiveness for Search and Undecidability of PCSP Meta-Problems

Larrauri, Alberto

It is an open question whether the search and decision versions of promise CSPs are equivalent. Most known algorithms for PCSPs solve only their \emph{decision} variant, and it is unknown whether they can be adapted to solve \emph{search} as well. The main approaches, called BLP, AIP and BLP+AIP, handle a PCSP by finding a solution to a relaxation of some integer program. We prove that rounding those solutions to a proper search certificate can be as hard as any problem in the class TFNP. In other words, these algorithms are ineffective for search. Building on the algebraic approach to PCSPs, we find sufficient conditions that imply ineffectiveness for search. Our tools are tailored to algorithms that are characterized by minions in a suitable way, and can also be used to prove undecidability results for meta-problems. This way, we show that the families of templates solvable via BLP, AIP, and BLP+AIP are undecidable. Using the same techniques we also analyze several algebraic conditions that are known to guarantee the tractability of finite-template CSPs. We prove that several meta-problems related to cyclic polymorphims and WNUs are undecidable for PCSPs. In particular, there is no algorithm deciding whether a finite PCSP template (1) admits cyclic a polymorphism, (2) admits a WNU.

artificial intelligence, constraint-based reasoning, internal reference, (18 more...)

arXiv.org Artificial Intelligence

2504.04639

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.68)