AITopics | Computational Learning Theory

Collaborating Authors

Computational Learning Theory

In computer science, computational learning theory (or just learning theory) is a subfield of Artificial Intelligence devoted to studying the design and analysis of machine learning algorithms (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Surge Phenomenon in Optimal Learning Rate and Batch Size Scaling

Li, Shuaipeng, Zhao, Penghao, Zhang, Hailin, Sun, Xingwu, Wu, Hao, Jiao, Dian, Wang, Weiyan, Liu, Chengjun, Fang, Zheng, Xue, Jinbao, Tao, Yangyu, Cui, Bin, Wang, Di

arXiv.org Artificial IntelligenceJun-4-2024

In current deep learning tasks, Adam style optimizers such as Adam, Adagrad, RMSProp, Adafactor, and Lion have been widely used as alternatives to SGD style optimizers. These optimizers typically update model parameters using the sign of gradients, resulting in more stable convergence curves. The learning rate and the batch size are the most critical hyperparameters for optimizers, which require careful tuning to enable effective convergence. Previous research has shown that the optimal learning rate increases linearly or follows similar rules with batch size for SGD style optimizers. However, this conclusion is not applicable to Adam style optimizers. In this paper, we elucidate the connection between optimal learning rates and batch sizes for Adam style optimizers through both theoretical analysis and extensive experiments. First, we raise the scaling law between batch sizes and optimal learning rates in the sign of gradient case, in which we prove that the optimal learning rate first rises and then falls as the batch size increases. Moreover, the peak value of the surge will gradually move toward the larger batch size as training progresses. Second, we conducted experiments on various CV and NLP tasks and verified the correctness of the scaling law.

batch size, optimal, optimizer, (15 more...)

arXiv.org Artificial Intelligence

2405.14578

Country: Asia > Macao (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (1.00)

Add feedback

Replicability in High Dimensional Statistics

Hopkins, Max, Impagliazzo, Russell, Kane, Daniel, Liu, Sihan, Ye, Christopher

arXiv.org Machine LearningJun-3-2024

The replicability crisis is a major issue across nearly all areas of empirical science, calling for the formal study of replicability in statistics. Motivated in this context, [Impagliazzo, Lei, Pitassi, and Sorrell STOC 2022] introduced the notion of replicable learning algorithms, and gave basic procedures for $1$-dimensional tasks including statistical queries. In this work, we study the computational and statistical cost of replicability for several fundamental high dimensional statistical tasks, including multi-hypothesis testing and mean estimation. Our main contribution establishes a computational and statistical equivalence between optimal replicable algorithms and high dimensional isoperimetric tilings. As a consequence, we obtain matching sample complexity upper and lower bounds for replicable mean estimation of distributions with bounded covariance, resolving an open problem of [Bun, Gaboardi, Hopkins, Impagliazzo, Lei, Pitassi, Sivakumar, and Sorrell, STOC2023] and for the $N$-Coin Problem, resolving a problem of [Karbasi, Velegkas, Yang, and Zhou, NeurIPS2023] up to log factors. While our equivalence is computational, allowing us to shave log factors in sample complexity from the best known efficient algorithms, efficient isoperimetric tilings are not known. To circumvent this, we introduce several relaxed paradigms that do allow for sample and computationally efficient algorithms, including allowing pre-processing, adaptivity, and approximate replicability. In these cases we give efficient algorithms matching or beating the best known sample complexity for mean estimation and the coin problem, including a generic procedure that reduces the standard quadratic overhead of replicability to linear in expectation.

algorithm, probability, sample complexity, (14 more...)

arXiv.org Machine Learning

2406.02628

Country:

North America > United States > California > San Diego County > San Diego (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
(2 more...)

Genre: Research Report > Experimental Study (0.92)

Industry:

Health & Medicine (1.00)
Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.45)

Add feedback

MIDGARD: Self-Consistency Using Minimum Description Length for Structured Commonsense Reasoning

Nair, Inderjeet, Wang, Lu

arXiv.org Artificial IntelligenceJun-2-2024

We study the task of conducting structured reasoning as generating a reasoning graph from natural language input using large language models (LLMs). Previous approaches have explored various prompting schemes, yet they suffer from error propagation due to the autoregressive nature and single-pass-based decoding, which lack error correction capability. Additionally, relying solely on a single sample may result in the omission of true nodes and edges. To counter this, we draw inspiration from self-consistency (SC), which involves sampling a diverse set of reasoning chains and taking the majority vote as the final answer. To tackle the substantial challenge of applying SC on generated graphs, we propose MIDGARD (MInimum Description length Guided Aggregation of Reasoning in Directed acyclic graph) that leverages Minimum Description Length (MDL)-based formulation to identify consistent properties among the different graph samples generated by an LLM. This formulation helps reject properties that appear in only a few samples, which are likely to be erroneous, while enabling the inclusion of missing elements without compromising precision. Our method demonstrates superior performance than comparisons across various structured reasoning tasks, including argument structure extraction, explanation graph generation, inferring dependency relations among actions for everyday tasks, and semantic graph generation from natural texts.

description length, graph, midgard, (13 more...)

arXiv.org Artificial Intelligence

2405.05189

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Middle East > Jordan (0.04)
(9 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.81)

Add feedback

Learning to Defer in Content Moderation: The Human-AI Interplay

Lykouris, Thodoris, Weng, Wentao

arXiv.org Artificial IntelligenceJun-2-2024

Successful content moderation in online platforms relies on a human-AI collaboration approach. A typical heuristic estimates the expected harmfulness of a post and uses fixed thresholds to decide whether to remove it and whether to send it for human review. This disregards the prediction uncertainty, the time-varying element of human review capacity and post arrivals, and the selective sampling in the dataset (humans only review posts filtered by the admission algorithm). In this paper, we introduce a model to capture the human-AI interplay in content moderation. The algorithm observes contextual information for incoming posts, makes classification and admission decisions, and schedules posts for human review. Only admitted posts receive human reviews on their harmfulness. These reviews help educate the machine-learning algorithms but are delayed due to congestion in the human review system. The classical learning-theoretic way to capture this human-AI interplay is via the framework of learning to defer, where the algorithm has the option to defer a classification task to humans for a fixed cost and immediately receive feedback. Our model contributes to this literature by introducing congestion in the human review system. Moreover, unlike work on online learning with delayed feedback where the delay in the feedback is exogenous to the algorithm's decisions, the delay in our model is endogenous to both the admission and the scheduling decisions. We propose a near-optimal learning algorithm that carefully balances the classification loss from a selectively sampled dataset, the idiosyncratic loss of non-reviewed posts, and the delay loss of having congestion in the human review system. To the best of our knowledge, this is the first result for online learning in contextual queueing systems and hence our analytical framework may be of independent interest.

inequality, lemma 4, lemma 5, (17 more...)

arXiv.org Artificial Intelligence

2402.12237

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Virginia (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.63)

Industry:

Health & Medicine (0.93)
Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.48)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.45)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.45)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.34)

Add feedback

Optimistic Rates for Learning from Label Proportions

Li, Gene, Chen, Lin, Javanmard, Adel, Mirrokni, Vahab

arXiv.org Machine LearningJun-1-2024

We consider a weakly supervised learning problem called Learning from Label Proportions (LLP), where examples are grouped into ``bags'' and only the average label within each bag is revealed to the learner. We study various learning rules for LLP that achieve PAC learning guarantees for classification loss. We establish that the classical Empirical Proportional Risk Minimization (EPRM) learning rule (Yu et al., 2014) achieves fast rates under realizability, but EPRM and similar proportion matching learning rules can fail in the agnostic setting. We also show that (1) a debiased proportional square loss, as well as (2) a recently proposed EasyLLP learning rule (Busa-Fekete et al., 2023) both achieve ``optimistic rates'' (Panchenko, 2002); in both the realizable and agnostic settings, their sample complexity is optimal (up to log factors) in terms of $\epsilon, \delta$, and VC dimension.

log 1, probability, proportion, (17 more...)

arXiv.org Machine Learning

2406.00487

Country:

North America > United States > California (0.14)
Oceania > Australia > Australian Capital Territory > Canberra (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)

Genre: Research Report (0.63)

Industry:

Information Technology (0.67)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.46)

Add feedback

Optimal bounds for $\ell_p$ sensitivity sampling via $\ell_2$ augmentation

Munteanu, Alexander, Omlor, Simon

arXiv.org Machine LearningJun-1-2024

Data subsampling is one of the most natural methods to approximate a massively large data set by a small representative proxy. In particular, sensitivity sampling received a lot of attention, which samples points proportional to an individual importance measure called sensitivity. This framework reduces in very general settings the size of data to roughly the VC dimension $d$ times the total sensitivity $\mathfrak S$ while providing strong $(1\pm\varepsilon)$ guarantees on the quality of approximation. The recent work of Woodruff & Yasuda (2023c) improved substantially over the general $\tilde O(\varepsilon^{-2}\mathfrak Sd)$ bound for the important problem of $\ell_p$ subspace embeddings to $\tilde O(\varepsilon^{-2}\mathfrak S^{2/p})$ for $p\in[1,2]$. Their result was subsumed by an earlier $\tilde O(\varepsilon^{-2}\mathfrak Sd^{1-p/2})$ bound which was implicitly given in the work of Chen & Derezinski (2021). We show that their result is tight when sampling according to plain $\ell_p$ sensitivities. We observe that by augmenting the $\ell_p$ sensitivities by $\ell_2$ sensitivities, we obtain better bounds improving over the aforementioned results to optimal linear $\tilde O(\varepsilon^{-2}(\mathfrak S+d)) = \tilde O(\varepsilon^{-2}d)$ sampling complexity for all $p \in [1,2]$. In particular, this resolves an open question of Woodruff & Yasuda (2023c) in the affirmative for $p \in [1,2]$ and brings sensitivity subsampling into the regime that was previously only known to be possible using Lewis weights (Cohen & Peng, 2015). As an application of our main result, we also obtain an $\tilde O(\varepsilon^{-2}\mu d)$ sensitivity sampling bound for logistic regression, where $\mu$ is a natural complexity measure for this problem. This improves over the previous $\tilde O(\varepsilon^{-2}\mu^2 d)$ bound of Mai et al. (2021) which was based on Lewis weights subsampling.

probability, sensitivity, woodruff and yasuda, (15 more...)

arXiv.org Machine Learning

2406.00328

Country: Europe > Germany > North Rhine-Westphalia > Arnsberg Region > Dortmund (0.04)

Genre:

Research Report > New Finding (0.34)
Research Report > Experimental Study (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.34)

Add feedback

Dual VC Dimension Obstructs Sample Compression by Embeddings

Chase, Zachary, Chornomaz, Bogdan, Hanneke, Steve, Moran, Shay, Yehudayoff, Amir

arXiv.org Artificial IntelligenceMay-27-2024

This work studies embedding of arbitrary VC classes in well-behaved VC classes, focusing particularly on extremal classes. Our main result expresses an impossibility: such embeddings necessarily require a significant increase in dimension. In particular, we prove that for every $d$ there is a class with VC dimension $d$ that cannot be embedded in any extremal class of VC dimension smaller than exponential in $d$. In addition to its independent interest, this result has an important implication in learning theory, as it reveals a fundamental limitation of one of the most extensively studied approaches to tackling the long-standing sample compression conjecture. Concretely, the approach proposed by Floyd and Warmuth entails embedding any given VC class into an extremal class of a comparable dimension, and then applying an optimal sample compression scheme for extremal classes. However, our results imply that this strategy would in some cases result in a sample compression scheme at least exponentially larger than what is predicted by the sample compression conjecture. The above implications follow from a general result we prove: any extremal class with VC dimension $d$ has dual VC dimension at most $2d+1$. This bound is exponentially smaller than the classical bound $2^{d+1}-1$ of Assouad, which applies to general concept classes (and is known to be unimprovable for some classes). We in fact prove a stronger result, establishing that $2d+1$ upper bounds the dual Radon number of extremal classes. This theorem represents an abstraction of the classical Radon theorem for convex sets, extending its applicability to a wider combinatorial framework, without relying on the specifics of Euclidean convexity. The proof utilizes the topological method and is primarily based on variants of the Topological Radon Theorem.

dimension, extremal class, vc dimension, (14 more...)

arXiv.org Artificial Intelligence

2405.1712

Country:

Asia > Middle East > Israel (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Russia (0.04)
(4 more...)

Genre: Research Report (0.84)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (1.00)

Add feedback

Language-guided Skill Learning with Temporal Variational Inference

Fu, Haotian, Sharma, Pratyusha, Stengel-Eskin, Elias, Konidaris, George, Roux, Nicolas Le, Côté, Marc-Alexandre, Yuan, Xingdi

arXiv.org Artificial IntelligenceMay-27-2024

We present an algorithm for skill discovery from expert demonstrations. The algorithm first utilizes Large Language Models (LLMs) to propose an initial segmentation of the trajectories. Following that, a hierarchical variational inference framework incorporates the LLM-generated segmentation information to discover reusable skills by merging trajectory segments. To further control the trade-off between compression and reusability, we introduce a novel auxiliary objective based on the Minimum Description Length principle that helps guide this skill discovery process. Our results demonstrate that agents equipped with our method are able to discover skills that help accelerate learning and outperform baseline skill learning approaches on new long-horizon tasks in BabyAI, a grid world navigation environment, as well as ALFRED, a household simulation environment.

learning, segmentation, trajectory, (16 more...)

arXiv.org Artificial Intelligence

2402.16354

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Austria > Vienna (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(17 more...)

Genre: Research Report > New Finding (0.67)

Industry:

Leisure & Entertainment > Games > Computer Games (0.48)
Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.34)

Add feedback

Understanding Server-Assisted Federated Learning in the Presence of Incomplete Client Participation

Yang, Haibo, Qiu, Peiwen, Khanduri, Prashant, Fang, Minghong, Liu, Jia

arXiv.org Artificial IntelligenceMay-25-2024

Existing works in federated learning (FL) often assume an ideal system with either full client or uniformly distributed client participation. However, in practice, it has been observed that some clients may never participate in FL training (aka incomplete client participation) due to a myriad of system heterogeneity factors. A popular approach to mitigate impacts of incomplete client participation is the server-assisted federated learning (SA-FL) framework, where the server is equipped with an auxiliary dataset. However, despite SA-FL has been empirically shown to be effective in addressing the incomplete client participation problem, there remains a lack of theoretical understanding for SA-FL. Meanwhile, the ramifications of incomplete client participation in conventional FL are also poorly understood. These theoretical gaps motivate us to rigorously investigate SA-FL. Toward this end, we first show that conventional FL is {\em not} PAC-learnable under incomplete client participation in the worst case. Then, we show that the PAC-learnability of FL with incomplete client participation can indeed be revived by SA-FL, which theoretically justifies the use of SA-FL for the first time. Lastly, to provide practical guidance for SA-FL training under {\em incomplete client participation}, we propose the $\mathsf{SAFARI}$ (server-assisted federated averaging) algorithm that enjoys the same linear convergence speedup guarantees as classic FL with ideal client participation assumptions, offering the first SA-FL algorithm with convergence guarantee. Extensive experiments on different datasets show $\mathsf{SAFARI}$ significantly improves the performance under incomplete client participation.

client participation, incomplete client participation, participation, (9 more...)

arXiv.org Artificial Intelligence

2405.02745

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Ohio > Franklin County > Columbus (0.04)
North America > United States > New York > Monroe County > Rochester (0.04)
North America > United States > Michigan > Wayne County > Detroit (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.46)

Add feedback

Randomized heuristic repair for large-scale multidimensional knapsack problem

Martins, Jean P.

arXiv.org Artificial IntelligenceMay-24-2024

The multidimensional knapsack problem (MKP) is an NP-hard combinatorial optimization problem whose solution is determining a subset of maximum total profit items that do not violate capacity constraints. Due to its hardness, large-scale MKP instances are usually a target for metaheuristics, a context in which effective feasibility maintenance strategies are crucial. In 1998, Chu and Beasley proposed an effective heuristic repair that is still relevant for recent metaheuristics. However, due to its deterministic nature, the diversity of solutions such heuristic provides is insufficient for long runs. As a result, the search for new solutions ceases after a while. This paper proposes an efficiency-based randomization strategy for the heuristic repair that increases the variability of the repaired solutions without deteriorating quality and improves the overall results.

algorithm, heuristic repair, multidimensional knapsack problem, (12 more...)

arXiv.org Artificial Intelligence

2405.15569

Country: South America > Brazil (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.66)

Add feedback