AITopics

2409.1863

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > New York (0.04)
(5 more...)

Genre:

Research Report (0.63)
Instructional Material (0.45)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
(2 more...)

Larsen, Kasper Green, Montasser, Omar, Zhivotovskiy, Nikita

Derandomizing Multi-Distribution Learning

arXiv.org Artificial IntelligenceSep-26-2024

Multi-distribution or collaborative learning involves learning a single predictor that works well across multiple data distributions, using samples from each during training. Recent research on multi-distribution learning, focusing on binary loss and finite VC dimension classes, has shown near-optimal sample complexity that is achieved with oracle efficient algorithms. That is, these algorithms are computationally efficient given an efficient ERM for the class. Unlike in classical PAC learning, where the optimal sample complexity is achieved with deterministic predictors, current multi-distribution learning algorithms output randomized predictors. This raises the question: can these algorithms be derandomized to produce a deterministic predictor for multiple distributions? Through a reduction to discrepancy minimization, we show that derandomizing multi-distribution learning is computationally hard, even when ERM is computationally efficient. On the positive side, we identify a structural condition enabling an efficient black-box reduction, converting existing randomized multi-distribution predictors into deterministic ones.

algorithm, learning, multi-distribution learning, (14 more...)

2409.17567

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > Los Angeles County > Long Beach (0.14)
Asia > Afghanistan > Parwan Province > Charikar (0.04)
(10 more...)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (1.00)

arXiv.org Machine LearningSep-17-2024

The Sample Complexity of Smooth Boosting and the Tightness of the Hardcore Theorem

Blanc, Guy, Hayderi, Alexandre, Koch, Caleb, Tan, Li-Yang

Smooth boosters generate distributions that do not place too much weight on any given example. Originally introduced for their noise-tolerant properties, such boosters have also found applications in differential privacy, reproducibility, and quantum learning theory. We study and settle the sample complexity of smooth boosting: we exhibit a class that can be weak learned to $\gamma$-advantage over smooth distributions with $m$ samples, for which strong learning over the uniform distribution requires $\tilde{\Omega}(1/\gamma^2)\cdot m$ samples. This matches the overhead of existing smooth boosters and provides the first separation from the setting of distribution-independent boosting, for which the corresponding overhead is $O(1/\gamma)$. Our work also sheds new light on Impagliazzo's hardcore theorem from complexity theory, all known proofs of which can be cast in the framework of smooth boosting. For a function $f$ that is mildly hard against size-$s$ circuits, the hardcore theorem provides a set of inputs on which $f$ is extremely hard against size-$s'$ circuits. A downside of this important result is the loss in circuit size, i.e. that $s' \ll s$. Answering a question of Trevisan, we show that this size loss is necessary and in fact, the parameters achieved by known proofs are the best possible.

complexity, junta complexity, probability, (15 more...)

2409.11597

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.66)

von Pichowski, Jan, Blöcker, Christopher, Scholtes, Ingo

Hierarchical Graph Pooling Based on Minimum Description Length

arXiv.org Artificial IntelligenceSep-16-2024

Graph pooling is an essential part of deep graph representation learning. We introduce MapEqPool, a principled pooling operator that takes the inherent hierarchical structure of real-world graphs into account. MapEqPool builds on the map equation, an information-theoretic objective function for community detection based on the minimum description length principle which naturally implements Occam's razor and balances between model complexity and fit. We demonstrate MapEqPool's competitive performance with an empirical comparison against various baselines across standard graph classification datasets.

codeword, hierarchical graph pooling, map equation, (11 more...)

2409.10263

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Netherlands > South Holland > Leiden (0.04)
Europe > Germany > Bavaria > Lower Franconia > Würzburg (0.04)

Genre: Research Report (0.51)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)

Ren, Yi, Sutherland, Danica J.

Understanding Simplicity Bias towards Compositional Mappings via Learning Dynamics

arXiv.org Machine LearningSep-15-2024

Obtaining compositional mappings is important for the model to generalize well compositionally. To better understand when and how to encourage the model to learn such mappings, we study their uniqueness through different perspectives. Specifically, we first show that the compositional mappings are the simplest bijections through the lens of coding length (i.e., an upper bound of their Kolmogorov complexity). This property explains why models having such mappings can generalize well. We further show that the simplicity bias is usually an intrinsic property of neural network training via gradient descent. That partially explains why some models spontaneously generalize well when they are trained appropriately.

compositional mapping, kolmogorov complexity, mapping, (12 more...)

2409.09626

Country:

North America > Canada > British Columbia (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.36)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Höltgen, Benedikt, Williamson, Robert C.

Which distribution were you sampled from? Towards a more tangible conception of data

arXiv.org Artificial IntelligenceSep-12-2024

Machine Learning research, as most of Statistics, heavily relies on the concept of a data-generating probability distribution. The standard presumption is that since data points are `sampled from' such a distribution, one can learn from observed data about this distribution and, thus, predict future data points which, it is presumed, are also drawn from it. Drawing on scholarship across disciplines, we here argue that this framework is not always a good model. Not only do such true probability distributions not exist; the framework can also be misleading and obscure both the choices made and the goals pursued in machine learning practice. We suggest an alternative framework that focuses on finite populations rather than abstract distributions; while classical learning theory can be left almost unchanged, it opens new opportunities, especially to model sampling. We compile these considerations into five reasons for modelling machine learning -- in some settings -- with finite populations rather than generative distributions, both to be more faithful to practice and to provide novel theoretical insights.

artificial intelligence, machine learning, probability, (17 more...)

2407.17395

Country:

Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
North America > United States > Wisconsin (0.04)
(5 more...)

Genre: Research Report (0.82)

Industry: Education (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.34)

arXiv.org Machine LearningSep-11-2024

A Practical Theory of Generalization in Selectivity Learning

Wu, Peizhi, Xu, Haoshu, Marcus, Ryan, Ives, Zachary G.

Query-driven machine learning models have emerged as a promising estimation technique for query selectivities. Yet, surprisingly little is known about the efficacy of these techniques from a theoretical perspective, as there exist substantial gaps between practical solutions and state-of-the-art (SOTA) theory based on the Probably Approximately Correct (PAC) learning framework. In this paper, we aim to bridge the gaps between theory and practice. First, we demonstrate that selectivity predictors induced by signed measures are learnable, which relaxes the reliance on probability measures in SOTA theory. More importantly, beyond the PAC learning framework (which only allows us to characterize how the model behaves when both training and test workloads are drawn from the same distribution), we establish, under mild assumptions, that selectivity predictors from this class exhibit favorable out-of-distribution (OOD) generalization error bounds. These theoretical advances provide us with a better understanding of both the in-distribution and OOD generalization capabilities of query-driven selectivity learning, and facilitate the design of two general strategies to improve OOD generalization for existing query-driven selectivity models. We empirically verify that our techniques help query-driven selectivity models generalize significantly better to OOD queries both in terms of prediction accuracy and query latency performance, while maintaining their superior in-distribution generalization performance.

generalization, neurocdf, query, (15 more...)

2409.07014

Country:

North America > United States > District of Columbia > Washington (0.05)
North America > United States > Pennsylvania (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.46)

arXiv.org Artificial IntelligenceSep-4-2024

Backdoor defense, learnability and obfuscation

Christiano, Paul, Hilton, Jacob, Lecomte, Victor, Xu, Mark

We introduce a formal notion of defendability against backdoors using a game between an attacker and a defender. In this game, the attacker modifies a function to behave differently on a particular input known as the "trigger", while behaving the same almost everywhere else. The defender then attempts to detect the trigger at evaluation time. If the defender succeeds with high enough probability, then the function class is said to be defendable. The key constraint on the attacker that makes defense possible is that the attacker's strategy must work for a randomly-chosen trigger. Our definition is simple and does not explicitly mention learning, yet we demonstrate that it is closely connected to learnability. In the computationally unbounded setting, we use a voting algorithm of Hanneke et al. (2022) to show that defendability is essentially determined by the VC dimension of the function class, in much the same way as PAC learnability. In the computationally bounded setting, we use a similar argument to show that efficient PAC learnability implies efficient defendability, but not conversely. On the other hand, we use indistinguishability obfuscation to show that the class of polynomial size circuits is not efficiently defendable. Finally, we present polynomial size decision trees as a natural example for which defense is strictly easier than learning. Thus, we identify efficient defendability as a notable intermediate concept in between efficient learnability and obfuscation.

detection strategy, probability, representation class, (14 more...)

2409.03077

Country: Asia > India > Karnataka > Bengaluru (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (1.00)

arXiv.org Artificial IntelligenceAug-28-2024

Fairness, Accuracy, and Unreliable Data

Stangl, Kevin

This thesis investigates three areas targeted at improving the reliability of machine learning; fairness in machine learning, strategic classification, and algorithmic robustness. Each of these domains has special properties or structure that can complicate learning. A theme throughout this thesis is thinking about ways in which a'plain' empirical risk minimization algorithm will be misleading or ineffective because of a mis-match between classical learning theory assumptions and specific properties of some data distribution in the wild. The overarching research goal for these related topics is to provide a crisp mathematical model for each learning scenario that exposes different failure modes and makes trade-offs between important metrics explicit in order to provide algorithmic advice or recommendations to practitioners and expose gaps for future research. By tuning our learning algorithms to be more distribution specific in these scenarios, the resulting learned system will exhibit higher utility and avoid catastrophic failure modes. This research is grounded in the theory of machine learning and is fundamentally mathematical in nature, with empirical support when appropriate. Theory is particularly important in these sensitive domains as it is unclear which poor behavior in deployed systems is a natural or benign consequence of a learning system with the underlying distribution,contrasting with problematic but correctable behavior caused by an error in algorithm design or implementation, how to mitigate these issues, or what a successful outcome even looks like in each problem. Theoretical understanding in each domain can help guide best practices and allow for the design of effective, reliable, and robust systems.

arxiv preprint arxiv, multi-robustness guarantee, strategic classification, (15 more...)

2408.1604

Country:

Africa > South Sudan > Equatoria > Central Equatoria > Juba (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(10 more...)

Genre:

Research Report > New Finding (1.00)
Workflow (0.92)

Industry:

Health & Medicine (1.00)
Banking & Finance (1.00)
Law (0.92)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Rismanchian, Sina, Doroudi, Shayan

Reconciling Different Theories of Learning with an Agent-based Model of Procedural Learning

arXiv.org Artificial IntelligenceAug-23-2024

Computational models of human learning can play a significant role in enhancing our knowledge about nuances in theoretical and qualitative learning theories and frameworks. There are many existing frameworks in educational settings that have shown to be verified using empirical studies, but at times we find these theories make conflicting claims or recommendations for instruction. In this study, we propose a new computational model of human learning, Procedural ABICAP, that reconciles the ICAP, Knowledge-Learning-Instruction (KLI), and cognitive load theory (CLT) frameworks for learning procedural knowledge. ICAP assumes that constructive learning generally yields better learning outcomes, while theories such as KLI and CLT claim that this is not always true. We suppose that one reason for this may be that ICAP is primarily used for conceptual learning and is underspecified as a framework for thinking about procedural learning. We show how our computational model, both by design and through simulations, can be used to reconcile different results in the literature. More generally, we position our computational model as an executable theory of learning that can be used to simulate various educational settings.

knowledge, learner, learning, (12 more...)

2408.13364

Country:

North America > United States > California > Orange County > Irvine (0.14)
Europe > Spain (0.04)
Europe > France (0.04)

Genre: Research Report > New Finding (0.88)

Industry:

Education > Educational Setting (0.54)
Education > Educational Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.41)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.34)