AITopics

2505.03223

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Germany (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.70)

arXiv.org Artificial IntelligenceMay-5-2025

MADIL: An MDL-based Framework for Efficient Program Synthesis in the ARC Benchmark

Ferré, Sébastien

Artificial Intelligence (AI) has achieved remarkable success in specialized tasks but struggles with efficient skill acquisition and generalization. The Abstraction and Reasoning Corpus (ARC) benchmark evaluates intelligence based on minimal training requirements. While Large Language Models (LLMs) have recently improved ARC performance, they rely on extensive pre-training and high computational costs. We introduce MADIL (MDL-based AI), a novel approach leveraging the Minimum Description Length (MDL) principle for efficient inductive learning. MADIL performs pattern-based decomposition, enabling structured generalization. While its performance (7% at ArcPrize 2024) remains below LLM-based methods, it offers greater efficiency and interpretability. This paper details MADIL's methodology, its application to ARC, and experimental evaluations.

large language model, machine learning, transition, (24 more...)

2505.01081

Genre: Research Report > Promising Solution (0.66)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)
(2 more...)

Clerico, Eugenio, Flynn, Hamish, Kotłowski, Wojciech, Neu, Gergely

Confidence Sequences for Generalized Linear Models via Regret Analysis

arXiv.org Machine LearningApr-23-2025

We develop a methodology for constructing confidence sets for parameters of statistical models via a reduction to sequential prediction. Our key observation is that for any generalized linear model (GLM), one can construct an associated game of sequential probability assignment such that achieving low regret in the game implies a high-probability upper bound on the excess likelihood of the true parameter of the GLM. This allows us to develop a scheme that we call online-to-confidence-set conversions, which effectively reduces the problem of proving the desired statistical claim to an algorithmic question. We study two varieties of this conversion scheme: 1) analytical conversions that only require proving the existence of algorithms with low regret and provide confidence sets centered at the maximum-likelihood estimator 2) algorithmic conversions that actively leverage the output of the online algorithm to construct confidence sets (and may be centered at other, adaptively constructed point estimators). The resulting methodology recovers all state-of-the-art confidence set constructions within a single framework, and also provides several new types of confidence sets that were previously unknown in the literature.

artificial intelligence, bayesian inference, machine learning, (16 more...)

2504.16555

Country:

North America > Canada > Alberta (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Poland > Greater Poland Province > Poznań (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
(2 more...)

Pardo, Luis Miguel, Sebastián, Daniel

Erzeugunsgrad, VC-Dimension and Neural Networks with rational activation function

arXiv.org Artificial IntelligenceApr-16-2025

The notion of Erzeugungsgrad was introduced by Joos Heintz in 1983 to bound the number of non-empty cells occurring after a process of quantifier elimination. We extend this notion and the combinatorial bounds of Theorem 2 in Heintz (1983) using the degree for constructible sets defined in Pardo-Sebastián (2022). We show that the Erzeugungsgrad is the key ingredient to connect affine Intersection Theory over algebraically closed fields and the VC-Theory of Computational Learning Theory for families of classifiers given by parameterized families of constructible sets. In particular, we prove that the VC-dimension and the Krull dimension are linearly related up to logarithmic factors based on Intersection Theory. Using this relation, we study the density of correct test sequences in evasive varieties. We apply these ideas to analyze parameterized families of neural networks with rational activation function.

artificial intelligence, irreducible component, machine learning, (18 more...)

2504.11345

Country: Europe > Spain (0.46)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (1.00)

Subramonian, Arjun, Dohmatob, Elvis

auto-fpt: Automating Free Probability Theory Calculations for Machine Learning Theory

arXiv.org Artificial IntelligenceApr-16-2025

A large part of modern machine learning theory often involves computing the high-dimensional expected trace of a rational expression of large rectangular random matrices. To symbolically compute such quantities using free probability theory, we introduce auto-fpt, a lightweight Python and SymPy-based tool that can automatically produce a reduced system of fixed-point equations which can be solved for the quantities of interest, and effectively constitutes a theory. We overview the algorithmic ideas underlying auto-fpt and its applications to various interesting problems, such as the high-dimensional error of linearized feed-forward neural networks, recovering well-known results. We hope that auto-fpt streamlines the majority of calculations involved in high-dimensional analysis, while helping the machine learning community reproduce known and uncover new phenomena.

artificial intelligence, machine learning, matrix, (13 more...)

2504.10754

Country: North America > United States > California (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (1.00)

Montasser, Omar, Shetty, Abhishek, Zhivotovskiy, Nikita

Beyond Worst-Case Online Classification: VC-Based Regret Bounds for Relaxed Benchmarks

arXiv.org Machine LearningApr-14-2025

We revisit online binary classification by shifting the focus from competing with the best-in-class binary loss to competing against relaxed benchmarks that capture smoothed notions of optimality. Instead of measuring regret relative to the exact minimal binary error -- a standard approach that leads to worst-case bounds tied to the Littlestone dimension -- we consider comparing with predictors that are robust to small input perturbations, perform well under Gaussian smoothing, or maintain a prescribed output margin. Previous examples of this were primarily limited to the hinge loss. Our algorithms achieve regret guarantees that depend only on the VC dimension and the complexity of the instance space (e.g., metric entropy), and notably, they incur only an $O(\log(1/\gamma))$ dependence on the generalized margin $\gamma$. This stands in contrast to most existing regret bounds, which typically exhibit a polynomial dependence on $1/\gamma$. We complement this with matching lower bounds. Our analysis connects recent ideas from adversarial robustness and smoothed online learning.

artificial intelligence, machine learning, sequence, (18 more...)

2504.10598

Country:

North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Virginia (0.04)
(12 more...)

Genre: Research Report (0.64)

Industry:

Education > Educational Setting > Online (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (0.50)

Technology:

Information Technology > Enterprise Applications > Human Resources > Learning Management (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

arXiv.org Machine LearningApr-11-2025

Riemannian Optimization on Relaxed Indicator Matrix Manifold

Yuan, Jinghui, Xie, Fangyuan, Nie, Feiping, Li, Xuelong

The indicator matrix plays an important role in machine learning, but optimizing it is an NP-hard problem. We propose a new relaxation of the indicator matrix and prove that this relaxation forms a manifold, which we call the Relaxed Indicator Matrix Manifold (RIM manifold). Based on Riemannian geometry, we develop a Riemannian toolbox for optimization on the RIM manifold. Specifically, we provide several methods of Retraction, including a fast Retraction method to obtain geodesics. We point out that the RIM manifold is a generalization of the double stochastic manifold, and it is much faster than existing methods on the double stochastic manifold, which has a complexity of $ \mathcal{O}(n^3) $, while RIM manifold optimization is $ \mathcal{O}(n) $ and often yields better results. We conducted extensive experiments, including image denoising, with millions of variables to support our conclusion, and applied the RIM manifold to Ratio Cut, we provide a rigorous convergence proof and achieve clustering results that outperform the state-of-the-art methods. Our Code in \href{https://github.com/Yuan-Jinghui/Riemannian-Optimization-on-Relaxed-Indicator-Matrix-Manifold}{here}.

data mining, machine learning, manifold, (18 more...)

2503.20505

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Iowa > Johnson County > Iowa City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(3 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
(3 more...)

arXiv.org Artificial IntelligenceApr-9-2025

Collaborative Prediction: Tractable Information Aggregation via Agreement

Collina, Natalie, Globus-Harris, Ira, Goel, Surbhi, Gupta, Varun, Roth, Aaron, Shi, Mirah

We give efficient "collaboration protocols" through which two parties, who observe different features about the same instances, can interact to arrive at predictions that are more accurate than either could have obtained on their own. The parties only need to iteratively share and update their own label predictions-without either party ever having to share the actual features that they observe. Our protocols are efficient reductions to the problem of learning on each party's feature space alone, and so can be used even in settings in which each party's feature space is illegible to the other-which arises in models of human/AI interaction and in multi-modal learning. The communication requirements of our protocols are independent of the dimensionality of the data. In an online adversarial setting we show how to give regret bounds on the predictions that the parties arrive at with respect to a class of benchmark policies defined on the joint feature space of the two parties, despite the fact that neither party has access to this joint feature space. We also give simpler algorithms for the same task in the batch setting in which we assume that there is a fixed but unknown data distribution. We generalize our protocols to a decision theoretic setting with high dimensional outcome spaces, where parties communicate only "best response actions." Our theorems give a computationally and statistically tractable generalization of past work on information aggregation amongst Bayesians who share a common and correct prior, as part of a literature studying "agreement" in the style of Aumann's agreement theorem. Our results require no knowledge of (or even the existence of) a prior distribution and are computationally efficient. Nevertheless we show how to lift our theorems back to this classical Bayesian setting, and in doing so, give new information aggregation theorems for Bayesian agreement.

artificial intelligence, machine learning, prediction, (17 more...)

2504.06075

Country: North America > United States (0.45)

Genre:

Research Report > Experimental Study (0.92)
Research Report > New Finding (0.65)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Gao, Chao, Shan, Liren, Srinivas, Vaidehi, Vijayaraghavan, Aravindan

Computing High-dimensional Confidence Sets for Arbitrary Distributions

arXiv.org Machine LearningApr-3-2025

We study the problem of learning a high-density region of an arbitrary distribution over $\mathbb{R}^d$. Given a target coverage parameter $\delta$, and sample access to an arbitrary distribution $D$, we want to output a confidence set $S \subset \mathbb{R}^d$ such that $S$ achieves $\delta$ coverage of $D$, i.e., $\mathbb{P}_{y \sim D} \left[ y \in S \right] \ge \delta$, and the volume of $S$ is as small as possible. This is a central problem in high-dimensional statistics with applications in finding confidence sets, uncertainty quantification, and support estimation. In the most general setting, this problem is statistically intractable, so we restrict our attention to competing with sets from a concept class $C$ with bounded VC-dimension. An algorithm is competitive with class $C$ if, given samples from an arbitrary distribution $D$, it outputs in polynomial time a set that achieves $\delta$ coverage of $D$, and whose volume is competitive with the smallest set in $C$ with the required coverage $\delta$. This problem is computationally challenging even in the basic setting when $C$ is the set of all Euclidean balls. Existing algorithms based on coresets find in polynomial time a ball whose volume is $\exp(\tilde{O}( d/ \log d))$-factor competitive with the volume of the best ball. Our main result is an algorithm that finds a confidence set whose volume is $\exp(\tilde{O}(d^{2/3}))$ factor competitive with the optimal ball having the desired coverage. The algorithm is improper (it outputs an ellipsoid). Combined with our computational intractability result for proper learning balls within an $\exp(\tilde{O}(d^{1-o(1)}))$ approximation factor in volume, our results provide an interesting separation between proper and (improper) learning of confidence sets.

algorithm, artificial intelligence, machine learning, (17 more...)

2504.02723

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(3 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.87)

Anderson, Aaron, Benedikt, Michael

Logical perspectives on learning statistical objects

arXiv.org Artificial IntelligenceApr-1-2025

We consider the relationship between learnability of a ``base class'' of functions on a set X and learnability of a class of statistical functions derived from the base class. For example, we refine results showing that learnability of a family of functions implies learnability of the family of functions mapping a function in the class to its expectation under a distribution. We will look at both Probably Approximately Correct (PAC) learning, where example inputs and outputs are chosen at random, and online learning, where the examples are chosen adversarially. We establish improved bounds on the sample complexity of learning for statistical classes, stated in terms of combinatorial dimensions of the base class. We do this by adapting techniques introduced in model theory for ``randomizing a structure''. We give particular attention to classes derived from logical formulas, and relate learnability of the statistical classes to properties of the formula. Finally, we provide bounds on the complexity of learning the statistical classes built on top of a logic-based hypothesis class.

artificial intelligence, dimension, machine learning, (19 more...)

2504.00847

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
Asia > Middle East > Israel (0.04)
(2 more...)

Genre: Research Report > New Finding (0.34)

Industry: Education > Educational Setting > Online (0.88)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.49)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.35)