AITopics | Computational Learning Theory

Collaborating Authors

Computational Learning Theory

In computer science, computational learning theory (or just learning theory) is a subfield of Artificial Intelligence devoted to studying the design and analysis of machine learning algorithms (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

A Survey of Learning Criteria Going Beyond the Usual Risk

Holland, Matthew J. (a:1:{s:5:"en_US";s:16:"Osaka University";}) | Tanabe, Kazuki

Journal of Artificial Intelligence ResearchNov-27-2023

Virtually all machine learning tasks are characterized using some form of loss function, and "good performance" is typically stated in terms of a sufficiently small average loss, taken over the random draw of test data. While optimizing for performance on average is intuitive, convenient to analyze in theory, and easy to implement in practice, such a choice brings about trade-offs. In this work, we survey and introduce a wide variety of non-traditional criteria used to design and evaluate machine learning algorithms, place the classical paradigm within the proper historical context, and propose a view of learning problems which emphasizes the question of "what makes for a desirable loss distribution?" in place of tacit use of the expected loss.

learning criteria, probability, statistics, (11 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.1.15000

AI Access Foundation

15000

Journal of Artificial Intelligence Research

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.04)
Asia > Middle East > Jordan (0.04)
(2 more...)

Genre: Overview (1.00)

Industry:

Education (0.67)
Banking & Finance (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.67)

Add feedback

Optimally Teaching a Linear Behavior Cloning Agent

Bharti, Shubham Kumar, Wright, Stephen, Singla, Adish, Zhu, Xiaojin

arXiv.org Artificial IntelligenceNov-26-2023

We study optimal teaching of Linear Behavior Cloning (LBC) learners. In this setup, the teacher can select which states to demonstrate to an LBC learner. The learner maintains a version space of infinite linear hypotheses consistent with the demonstration. The goal of the teacher is to teach a realizable target policy to the learner using minimum number of state demonstrations. This number is known as the Teaching Dimension(TD). We present a teaching algorithm called ``Teach using Iterative Elimination(TIE)" that achieves instance optimal TD. However, we also show that finding optimal teaching set computationally is NP-hard. We further provide an approximation algorithm that guarantees an approximation ratio of $\log(|A|-1)$ on the teaching dimension. Finally, we provide experimental results to validate the efficiency and effectiveness of our algorithm.

cone, extreme ray, teaching, (13 more...)

arXiv.org Artificial Intelligence

2311.15399

Country: North America > United States > Wisconsin > Dane County > Madison (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.49)

Add feedback

Bridging Algorithmic Information Theory and Machine Learning: A New Approach to Kernel Learning

Hamzi, Boumediene, Hutter, Marcus, Owhadi, Houman

arXiv.org Machine LearningNov-21-2023

Machine Learning (ML) and Algorithmic Information Theory (AIT) look at Complexity from different points of view. We explore the interface between AIT and Kernel Methods (that are prevalent in ML) by adopting an AIT perspective on the problem of learning kernels from data, in kernel ridge regression, through the method of Sparse Kernel Flows. In particular, by looking at the differences and commonalities between Minimal Description Length (MDL) and Regularization in Machine Learning (RML), we prove that the method of Sparse Kernel Flows is the natural approach to adopt to learn kernels from data. This paper shows that it is not necessary to use the statistical route to derive Sparse Kernel Flows and that one can directly work with code-lengths and complexities that are concepts that show up in AIT.

artificial intelligence, kernel, machine learning, (12 more...)

arXiv.org Machine Learning

doi: 10.13140/RG.2.2.36344.01285

2311.12624

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
Oceania > Australia > Australian Capital Territory > Canberra (0.04)
North America > United States > New York (0.04)
(4 more...)

Genre: Research Report (0.42)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Learning Deterministic Finite Automata from Confidence Oracles

Wu, Wilson

arXiv.org Artificial IntelligenceNov-17-2023

We discuss the problem of learning a deterministic finite automaton (DFA) from a confidence oracle. That is, we are given access to an oracle $Q$ with incomplete knowledge of some target language $L$ over an alphabet $\Sigma$; the oracle maps a string $x\in\Sigma^*$ to a score in the interval $[-1,1]$ indicating its confidence that the string is in the language. The interpretation is that the sign of the score signifies whether $x\in L$, while the magnitude $|Q(x)|$ represents the oracle's confidence. Our goal is to learn a DFA representation of the oracle that preserves the information that it is confident in. The learned DFA should closely match the oracle wherever it is highly confident, but it need not do this when the oracle is less sure of itself.

algorithm, dfa, oracle, (17 more...)

arXiv.org Artificial Intelligence

2311.10963

Country: Europe > Sweden > Stockholm > Stockholm (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.46)

Add feedback

Understanding and Mitigating Classification Errors Through Interpretable Token Patterns

Hedderich, Michael A., Fischer, Jonas, Klakow, Dietrich, Vreeken, Jilles

arXiv.org Artificial IntelligenceNov-17-2023

State-of-the-art NLP methods achieve human-like performance on many tasks, but make errors nevertheless. Characterizing these errors in easily interpretable terms gives insight into whether a classifier is prone to making systematic errors, but also gives a way to act and improve the classifier. We propose to discover those patterns of tokens that distinguish correct and erroneous predictions as to obtain global and interpretable descriptions for arbitrary NLP classifiers. We formulate the problem of finding a succinct and non-redundant set of such patterns in terms of the Minimum Description Length principle. Through an extensive set of experiments, we show that our method, Premise, performs well in practice. Unlike existing solutions, it recovers ground truth, even on highly imbalanced data over large vocabularies. In VQA and NER case studies, we confirm that it gives clear and actionable insight into the systematic errors made by NLP classifiers.

classifier, mitigating classification error, proceedings, (15 more...)

arXiv.org Artificial Intelligence

2311.1092

Country:

Europe > Germany > Saarland (0.05)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.05)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.49)

Add feedback

Time-Uniform Confidence Spheres for Means of Random Vectors

Chugg, Ben, Wang, Hongjian, Ramdas, Aaditya

arXiv.org Machine LearningNov-14-2023

We derive and study time-uniform confidence spheres - termed confidence sphere sequences (CSSs) - which contain the mean of random vectors with high probability simultaneously across all sample sizes. Inspired by the original work of Catoni and Giulini, we unify and extend their analysis to cover both the sequential setting and to handle a variety of distributional assumptions. More concretely, our results include an empirical-Bernstein CSS for bounded random vectors (resulting in a novel empirical-Bernstein confidence interval), a CSS for sub-$\psi$ random vectors, and a CSS for heavy-tailed random vectors based on a sequentially valid Catoni-Giulini estimator. Finally, we provide a version of our empirical-Bernstein CSS that is robust to contamination by Huber noise.

artificial intelligence, machine learning, sequence, (16 more...)

arXiv.org Machine Learning

2311.08168

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Asia > Japan > Honshū > Chūgoku > Hiroshima Prefecture > Hiroshima (0.04)

Genre: Research Report (0.84)

Technology:

Information Technology > Data Science (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

On learning spatial sequences with the movement of attention

Osaulenko, Viacheslav M.

arXiv.org Artificial IntelligenceNov-12-2023

In this paper we start with a simple question, how is it possible that humans can recognize different movements over skin with only a prior visual experience of them? Or in general, what is the representation of spatial sequences that are invariant to scale, rotation, and translation across different modalities? To answer, we rethink the mathematical representation of spatial sequences, argue against the minimum description length principle, and focus on the movements of attention. We advance the idea that spatial sequences must be represented on different levels of abstraction, this adds redundancy but is necessary for recognition and generalization. To address the open question of how these abstractions are formed we propose two hypotheses: the first invites exploring selectionism learning, instead of finding parameters in some models; the second proposes to find new data structures, not neural network architectures, to efficiently store and operate over redundant features to be further selected. Movements of attention are central to human cognition and lessons should be applied to new better learning algorithms.

representation, sequence, spatial sequence, (17 more...)

arXiv.org Artificial Intelligence

2311.06856

Country:

Europe > Ukraine > Kyiv Oblast > Kyiv (0.04)
Europe > Croatia > Primorje-Gorski Kotar County > Rijeka (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.49)

Add feedback

Minimum Description Length Hopfield Networks

Abudy, Matan, Lan, Nur, Chemla, Emmanuel, Katzir, Roni

arXiv.org Artificial IntelligenceNov-11-2023

Associative memory architectures are designed for memorization but also offer, through their retrieval method, a form of generalization to unseen inputs: stored memories can be seen as prototypes from this point of view. Focusing on Modern Hopfield Networks (MHN), we show that a large memorization capacity undermines the generalization opportunity. We offer a solution to better optimize this tradeoff. It relies on Minimum Description Length (MDL) to determine during training which memories to store, as well as how many of them.

digit, exemplar, generalization, (14 more...)

arXiv.org Artificial Intelligence

2311.06518

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
North America > United States > California > Alameda County > Berkeley (0.04)
Europe > Sweden > Vaestra Goetaland > Gothenburg (0.04)
Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.87)

Add feedback

Agnostic Membership Query Learning with Nontrivial Savings: New Results, Techniques

Karchmer, Ari

arXiv.org Machine LearningNov-11-2023

Agnostic learning [Hau92, KSS92] is an important generalization of PAC-learning [Val84]. Agnostic learning is meant to more accurately capture a common approach to machine learning, where a predefined set of functions is explored in order to find the one achieving the least error on a set of data produced by some totally unknown process. Thus, roughly speaking, the objective of an agnostic learning algorithm for a complexity class Λ is to output a hypothesis h whose error in approximating an arbitrary concept is nearly as small as that of the best possible hypothesis within Λ. The class Λ is referred to as the touchstone class. Designing computationally efficient (i.e., polynomial time) agnostic learning algorithms for expressive touchstone classes has historically been relatively hard. Even extremely simple touchstone classes such as parity functions are believed to be computationally hard to learn in the agnostic model [BFKL93]. Some positive results exist, however, including for piecewise functions [KSS92], restricted fan-in two-layer neural nets [Lee96], geometric patterns [GKS97], decision trees, [GKK08], and halfspaces [KKMS08]. If we take some combination of the common relaxations considered in computational learning theory, such as access to membership queries, distribution-specific learning, or super-polynomial runtime, more positive results become known. For instance, the famed polynomial time agnostic learning algorithm for parity functions due to [GL89] (also referred to sometimes as the KM algorithm after [KM91]), uses membership queries and requires a uniform distribution over unlabelled examples.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Machine Learning

2311.0669

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)
Asia > India > Goa > Panaji (0.04)

Genre: Research Report > New Finding (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (1.00)

Add feedback

Information-theoretic generalization bounds for learning from quantum data

Caro, Matthias, Gur, Tom, Rouzé, Cambyse, França, Daniel Stilck, Subramanian, Sathyawageeswar

arXiv.org Artificial IntelligenceNov-9-2023

Learning tasks play an increasingly prominent role in quantum information and computation. They range from fundamental problems such as state discrimination and metrology over the framework of quantum probably approximately correct (PAC) learning, to the recently proposed shadow variants of state tomography. However, the many directions of quantum learning theory have so far evolved separately. We propose a general mathematical formalism for describing quantum learning by training on classical-quantum data and then testing how well the learned hypothesis generalizes to new data. In this framework, we prove bounds on the expected generalization error of a quantum learner in terms of classical and quantum information-theoretic quantities measuring how strongly the learner's hypothesis depends on the specific data seen during training. To achieve this, we use tools from quantum optimal transport and quantum concentration inequalities to establish non-commutative versions of decoupling lemmas that underlie recent information-theoretic generalization bounds for classical machine learning. Our framework encompasses and gives intuitively accessible generalization bounds for a variety of quantum learning scenarios such as quantum state discrimination, PAC learning quantum states, quantum parameter estimation, and quantumly PAC learning classical functions. Thereby, our work lays a foundation for a unifying quantum information-theoretic perspective on quantum learning.

generalization, hyp, learner, (16 more...)

arXiv.org Artificial Intelligence

2311.05529

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)
Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(8 more...)

Genre: Research Report (0.81)

Industry: Information Technology > Security & Privacy (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (1.00)

Add feedback