AITopics | Computational Learning Theory

Distribution Learnability and Robustness

Neural Information Processing SystemsFeb-11-2025, 07:25:51 GMT

We examine the relationship between learnability and robust (or agnostic) learnability for the problem of distribution learning. We show that learnability of a distribution class implies robust learnability with only additive corruption, but not if there may be subtractive corruption. Thus, contrary to other learning settings (e.g., PAC learning of function classes), realizable learnability does not imply agnostic learnability. We also explore related implications in the context of compression schemes and differentially private learnability.

artificial intelligence, learnability, machine learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States (0.69)
North America > Canada (0.46)

Industry: Information Technology > Security & Privacy (0.92)

Technology:

Information Technology > Security & Privacy (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

Reliable learning in challenging environments

Neural Information Processing SystemsFeb-11-2025, 05:38:02 GMT

The problem of designing learners that provide guarantees that their predictions are provably correct is of increasing importance in machine learning. However, learning theoretic guarantees have only been considered in very specific settings. In this work, we consider the design and analysis of reliable learners in challenging testtime environments as encountered in modern machine learning problems: namely'adversarial' test-time attacks (in several variations) and'natural' distribution shifts. In this work, we provide a reliable learner with provably optimal guarantees in such settings. We discuss computationally feasible implementations of the learner and further show that our algorithm achieves strong positive performance guarantees on several natural examples: for example, linear separators under log-concave distributions or smooth boundary classifiers under smooth probability distributions.

artificial intelligence, international conference, machine learning, (13 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.47)

Add feedback

Improved Algorithms for Stochastic Linear Bandits Using Tail Bounds for Martingale Mixtures

Neural Information Processing SystemsFeb-11-2025, 03:56:11 GMT

We present improved algorithms with worst-case regret guarantees for the stochastic linear bandit problem. The widely used "optimism in the face of uncertainty"

artificial intelligence, data mining, machine learning, (20 more...)

Neural Information Processing Systems

Country: Europe (0.67)

Industry: Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Data Science > Data Mining > Big Data (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.46)

Add feedback

Optimal Learners for Realizable Regression: PAC Learning and Online Learning

Neural Information Processing SystemsFeb-11-2025, 03:36:19 GMT

In this work, we aim to characterize the statistical complexity of realizable regression both in the PAC learning setting and the online learning setting. Previous work had established the sufficiency of finiteness of the fat shattering dimension for PAC learnability and the necessity of finiteness of the scaled Natarajan dimension, but little progress had been made towards a more complete characterization since the work of Simon (SICOMP '97). To this end, we first introduce a minimax instance optimal learner for realizable regression and propose a novel dimension that both qualitatively and quantitatively characterizes which classes of real-valued predictors are learnable. We then identify a combinatorial dimension related to the Graph dimension that characterizes ERM learnability in the realizable setting. Finally, we establish a necessary condition for learnability based on a combinatorial dimension related to the DS dimension, and conjecture that it may also be sufficient in this context. Additionally, in the context of online learning we provide a dimension that characterizes the minimax instance optimal cumulative loss up to a constant factor and design an optimal online learner for realizable regression, thus resolving an open question raised by Daskalakis and Golowich in STOC '22.

artificial intelligence, dimension, machine learning, (17 more...)

Neural Information Processing Systems

Industry: Education > Educational Setting > Online (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (1.00)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.81)

Add feedback

Supplemental: Training Neural Networks is NP-Hard in Fixed Dimension A Detailed Proof of NP-Hardness for Two Dimensions-axis (with x 1 = 0, we call this vertical line h

Neural Information Processing SystemsFeb-11-2025, 03:14:23 GMT

In this section we provide the omitted details to prove Theorem 1. We start by describing the precise positions of the data points in the selection gadget. Next, we need a small ɛ > 0 to be chosen later in a global context. With the precise description of the selection gadget at hand, we can proceed to proving Lemma 4. Proof of Lemma 4. First, we focus on the three vertical lines h For the following argument, compare Figure 5. Observe that f restricted to one of the three lines is a one-dimensional, continuous, piecewise linear function with at most four breakpoints. Note that the exact location of these breakpoints and the slope in the sloped segments is not implied by the nine data points considered so far.

artificial intelligence, machine learning, selection gadget, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.40)

Add feedback

On the Sample Complexity of Privately Learning Axis-Aligned Rectangles Uri Stemmer

Neural Information Processing SystemsFeb-11-2025, 02:09:08 GMT

That is, existing constructions either require sample complexity that grows linearly with log |X|, or else it grows super linearly with the dimension (d.

algorithm, artificial intelligence, machine learning, (12 more...)

Neural Information Processing Systems

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.47)

Add feedback

Teaching via Best Case Counterexamples in the Learning with Equivalence Queries Paradigm

Neural Information Processing SystemsFeb-11-2025, 00:16:40 GMT

More concretely, we consider a learner who asks equivalence queries (i.e., "is the queried hypothesis the target hypothesis?"), and a teacher responds either "yes" or "no" along with a counterexample to the queried hypothesis. This learning paradigm has been extensively studied when the learner receives worst-case or random counterexamples; in this paper, we consider the optimal teacher who picks bestcase counterexamples to teach the target hypothesis within a hypothesis class. For this optimal teacher, we introduce LwEQ-TD, a notion of TD capturing the teaching complexity (i.e., the number of queries made) in this paradigm. We show that a significant reduction in queries can be achieved with best-case counterexamples, in contrast to worst-case or random counterexamples, for different hypothesis classes. Furthermore, we establish new connections of LwEQ-TD to the well-studied notions of TD in the learning-from-samples paradigm.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.67)

Industry: Education (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.68)

Add feedback

Teaching via Best Case Counterexamples in the Learning with Equivalence Queries Paradigm

Neural Information Processing SystemsFeb-11-2025, 00:16:36 GMT

More concretely, we consider a learner who asks equivalence queries (i.e., "is the queried hypothesis the target hypothesis?"), and a teacher responds either "yes" or "no" along with a counterexample to the queried hypothesis. This learning paradigm has been extensively studied when the learner receives worst-case or random counterexamples; in this paper, we consider the optimal teacher who picks bestcase counterexamples to teach the target hypothesis within a hypothesis class. For this optimal teacher, we introduce LwEQ-TD, a notion of TD capturing the teaching complexity (i.e., the number of queries made) in this paradigm. We show that a significant reduction in queries can be achieved with best-case counterexamples, in contrast to worst-case or random counterexamples, for different hypothesis classes. Furthermore, we establish new connections of LwEQ-TD to the well-studied notions of TD in the learning-from-samples paradigm.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.68)

Add feedback

The Role of Randomness in Stability

Hopkins, Max, Moran, Shay

arXiv.org Machine LearningFeb-11-2025

Stability is a central property in learning and statistics promising the output of an algorithm $A$ does not change substantially when applied to similar datasets $S$ and $S'$. It is an elementary fact that any sufficiently stable algorithm (e.g.\ one returning the same result with high probability, satisfying privacy guarantees, etc.) must be randomized. This raises a natural question: can we quantify how much randomness is needed for algorithmic stability? We study the randomness complexity of two influential notions of stability in learning: replicability, which promises $A$ usually outputs the same result when run over samples from the same distribution (and shared random coins), and differential privacy, which promises the output distribution of $A$ remains similar under neighboring datasets. The randomness complexity of these notions was studied recently in (Dixon et al. ICML 2024) and (Cannone et al. ITCS 2024) for basic $d$-dimensional tasks (e.g. estimating the bias of $d$ coins), but little is known about the measures more generally or in complex settings like classification. Toward this end, we prove a `weak-to-strong' boosting theorem for stability: the randomness complexity of a task $M$ (either under replicability or DP) is tightly controlled by the best replication probability of any deterministic algorithm solving the task, a weak measure called `global stability' that is universally capped at $\frac{1}{2}$ (Chase et al. FOCS 2023). Using this, we characterize the randomness complexity of PAC Learning: a class has bounded randomness complexity iff it has finite Littlestone dimension, and moreover scales at worst logarithmically in the excess error of the learner. This resolves a question of (Chase et al. STOC 2024) who asked for such a characterization in the equivalent language of (error-dependent) `list-replicability'.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

2502.08007

Country:

Europe (0.46)
North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.49)

Add feedback

Towards a Unified Information-Theoretic Framework for Generalization

Neural Information Processing SystemsFeb-10-2025, 23:35:41 GMT

In this work, we investigate the expressiveness of the "conditional mutual information" (CMI) framework of Steinke and Zakynthinou [1] and the prospect of using it to provide a unified framework for proving generalization bounds in the realizable setting. We first demonstrate that one can use this framework to express non-trivial (but sub-optimal) bounds for any learning algorithm that outputs hypotheses from a class of bounded VC dimension. We then explore two directions of strengthening this bound: (i) Can the CMI framework express optimal bounds for VC classes?

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > Canada > Ontario (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.89)

Add feedback

Filters

Collaborating Authors

Computational Learning Theory

Distribution Learnability and Robustness

Reliable learning in challenging environments

Improved Algorithms for Stochastic Linear Bandits Using Tail Bounds for Martingale Mixtures

Optimal Learners for Realizable Regression: PAC Learning and Online Learning

Supplemental: Training Neural Networks is NP-Hard in Fixed Dimension A Detailed Proof of NP-Hardness for Two Dimensions-axis (with x 1 = 0, we call this vertical line h

On the Sample Complexity of Privately Learning Axis-Aligned Rectangles Uri Stemmer

Teaching via Best Case Counterexamples in the Learning with Equivalence Queries Paradigm

Teaching via Best Case Counterexamples in the Learning with Equivalence Queries Paradigm

The Role of Randomness in Stability

Towards a Unified Information-Theoretic Framework for Generalization