In computer science, computational learning theory (or just learning theory) is a subfield of Artificial Intelligence devoted to studying the design and analysis of machine learning algorithms (Wikipedia)
The thirty eighth International Conference on Machine Learning (ICML) is now underway and will run for the entirety of this week (18 – 24 July), in a virtual only format. There will five invited talks to enjoy, as well as workshops, tutorials, affinity events and socials. Challenges in Deploying and monitoring Machine Learning Systems INNF: Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models ICML Workshop on Theoretic Foundation, Criticism, and Application Trend of Explainable AI Tackling Climate Change with Machine Learning Theory and Foundation of Continual Learning ICML 2021 Workshop on Unsupervised Reinforcement Learning Human-AI Collaboration in Sequential Decision-Making ICML Workshop on Representation Learning for Finance and E-Commerce Applications Reinforcement Learning for Real Life Uncertainty and Robustness in Deep Learning Interpretable Machine Learning in Healthcare 8th ICML Workshop on Automated Machine Learning (AutoML 2021) Theory and Practice of Differential Privacy The Neglected Assumptions In Causal Inference Machine Learning for Data: Automated Creation, Privacy, Bias ICML Workshop on Human in the Loop Learning (HILL) ICML Workshop on Algorithmic Recourse A Blessing in Disguise: The Prospects and Perils of Adversarial Machine Learning International Workshop on Federated Learning for User Privacy and Data Confidentiality in Conjunction with ICML 2021 (FL-ICML'21) Workshop on Socially Responsible Machine Learning ICML 2021 Workshop on Computational Biology Subset Selection in Machine Learning: From Theory to Applications Workshop on Computational Approaches to Mental Health @ ICML 2021 Workshop on Distribution-Free Uncertainty Quantification Information-Theoretic Methods for Rigorous, Responsible, and Reliable Machine Learning (ITR3) Beyond first-order methods in machine learning systems Self-Supervised Learning for Reasoning and Perception Time Series Workshop Workshop on Reinforcement Learning Theory Over-parameterization: Pitfalls and Opportunities
We study the problem of boosting the accuracy of a weak learner in the (distribution-independent) PAC model with Massart noise. In the Massart noise model, the label of each example $x$ is independently misclassified with probability $\eta(x) \leq \eta$, where $\eta<1/2$. The Massart model lies between the random classification noise model and the agnostic model. Our main positive result is the first computationally efficient boosting algorithm in the presence of Massart noise that achieves misclassification error arbitrarily close to $\eta$. Prior to our work, no non-trivial booster was known in this setting. Moreover, we show that this error upper bound is best possible for polynomial-time black-box boosters, under standard cryptographic assumptions. Our upper and lower bounds characterize the complexity of boosting in the distribution-independent PAC model with Massart noise. As a simple application of our positive result, we give the first efficient Massart learner for unions of high-dimensional rectangles.
Modern neural networks obtain information about the problem and calculate the output solely from the input values. We argue that it is not always optimal, and the network's performance can be significantly improved by augmenting it with a query mechanism that allows the network to make several solution trials at run time and get feedback on the loss value on each trial. To demonstrate the capabilities of the query mechanism, we formulate an unsupervised (not dependant on labels) loss function for Boolean Satisfiability Problem (SAT) and theoretically show that it allows the network to extract rich information about the problem. We then propose a neural SAT solver with a query mechanism called QuerySAT and show that it outperforms the neural baseline on a wide range of SAT tasks and the classical baselines on SHA-1 preimage attack and 3-SAT task.
Debiased machine learning is a meta algorithm based on bias correction and sample splitting to calculate confidence intervals for functionals (i.e. scalar summaries) of machine learning algorithms. For example, an analyst may desire the confidence interval for a treatment effect estimated with a neural network. We provide a nonasymptotic debiased machine learning theorem that encompasses any global or local functional of any machine learning algorithm that satisfies a few simple, interpretable conditions. Formally, we prove consistency, Gaussian approximation, and semiparametric efficiency by finite sample arguments. The rate of convergence is root-n for global functionals, and it degrades gracefully for local functionals. Our results culminate in a simple set of conditions that an analyst can use to translate modern learning theory rates into traditional statistical inference. The conditions reveal a new double robustness property for ill posed inverse problems.
Electrification was, without a doubt, the greatest engineering marvel of the 20th century. The electric motor was invented way back in 1821, and the electrical circuit was mathematically analyzed in 1827. But factory electrification, household electrification, and railway electrification all started slowly several decades later. The field of AI was formally founded in 1956. But it's only now--more than six decades later--that AI is expected to revolutionize the way humanity will live and work in the coming decades.
Welcome to ALT Highlights, a series of blog posts spotlighting various happenings at the recent conference ALT 2021, including plenary talks, tutorials, trends in learning theory, and more! To reach a broad audience, the series will be disseminated as guest posts on different blogs in machine learning and theoretical computer science. John has been kind enough to host the first post in the series. This initiative is organized by the Learning Theory Alliance, and overseen by Gautam Kamath. All posts in ALT Highlights are indexed on the official Learning Theory Alliance blog.
We introduce novel methods for encoding acyclicity and s-t-reachability constraints for propositional formulas with underlying directed graphs. They are based on vertex elimination graphs, which makes them suitable for cases where the underlying graph is sparse. In contrast to solvers with ad hoc constraint propagators for acyclicity and reachability constraints such as GraphSAT, our methods encode these constraints as standard propositional clauses, making them directly applicable with any SAT solver. An empirical study demonstrates that our methods together with an efficient SAT solver can outperform both earlier encodings of these constraints as well as GraphSAT, particularly when underlying graphs are sparse.
Pretrained language models (LMs) perform well on many tasks even when learning from a few examples, but prior work uses many held-out examples to tune various aspects of learning, such as hyperparameters, training objectives, and natural language templates ("prompts"). Here, we evaluate the few-shot ability of LMs when such held-out examples are unavailable, a setting we call true few-shot learning. We test two model selection criteria, cross-validation and minimum description length, for choosing LM prompts and hyperparameters in the true few-shot setting. On average, both marginally outperform random selection and greatly underperform selection based on held-out examples. Moreover, selection criteria often prefer models that perform significantly worse than randomly-selected ones. We find similar results even when taking into account our uncertainty in a model's true performance during selection, as well as when varying the amount of computation and number of examples used for selection. Overall, our findings suggest that prior work significantly overestimated the true few-shot ability of LMs given the difficulty of few-shot model selection.
Boolean Satisfiability (SAT) is a well-known NP-complete problem. Despite this theoretical hardness, SAT solvers based on Conflict Driven Clause Learning (CDCL) can solve large SAT instances from many important domains. CDCL learns clauses from conflicts, a technique that allows a solver to prune its search space. The selection heuristics in CDCL prioritize variables that are involved in recent conflicts. While only a fraction of decisions generate any conflicts, many generate multiple conflicts. In this paper, we study conflict-generating decisions in CDCL in detail. We investigate the impact of single conflict (sc) decisions, which generate only one conflict, and multi-conflict (mc) decisions which generate two or more. We empirically characterize these two types of decisions based on the quality of the learned clauses produced by each type of decision. We also show an important connection between consecutive clauses learned within the same mc decision, where one learned clause triggers the learning of the next one forming a chain of clauses. This leads to the consideration of similarity between conflicts, for which we formulate the notion of conflictsproximity as a similarity measure. We show that conflicts in mc decisions are more closely related than consecutive conflicts generated from sc decisions. Finally, we develop Common Reason Variable Reduction (CRVR) as a new decision strategy that reduces the selection priority of some variables from the learned clauses of mc decisions. Our empirical evaluation of CRVR implemented in three leading solvers demonstrates performance gains in benchmarks from the main track of SAT Competition-2020.
Adversarial robustness is an increasingly critical property of classifiers in applications. The design of robust algorithms relies on surrogate losses since the optimization of the adversarial loss with most hypothesis sets is NP-hard. But which surrogate losses should be used and when do they benefit from theoretical guarantees? We present an extensive study of this question, including a detailed analysis of the H-calibration and H-consistency of adversarial surrogate losses. We show that, under some general assumptions, convex loss functions, or the supremum-based convex losses often used in applications, are not H-calibrated for important hypothesis sets such as generalized linear models or one-layer neural networks. We then give a characterization of H-calibration and prove that some surrogate losses are indeed H-calibrated for the adversarial loss, with these hypothesis sets. Next, we show that H-calibration is not sufficient to guarantee consistency and prove that, in the absence of any distributional assumption, no continuous surrogate loss is consistent in the adversarial setting. This, in particular, proves that a claim presented in a COLT 2020 publication is inaccurate. (Calibration results there are correct modulo subtle definition differences, but the consistency claim does not hold.) Next, we identify natural conditions under which some surrogate losses that we describe in detail are H-consistent for hypothesis sets such as generalized linear models and one-layer neural networks. We also report a series of empirical results with simulated data, which show that many H-calibrated surrogate losses are indeed not H-consistent, and validate our theoretical assumptions.