Computational Learning Theory
AI is helping materials scientists home in on promising new materials
AI and automation are speeding up science and chemistry by helping scientists pick which experiments to conduct and home in on promising new materials. Why it matters: There's pressure on these fields to produce new materials faster and cheaper to support and power technologies that could transform industries and economies. The big picture: New materials and molecules are needed for the batteries, drugs and semiconductors envisioned to underpin a green grid, precise medicine, and the next generation of computing and communications. What's happening: It can take decades to get a new material to market in a process that involves an almost "artisanal science," Isayev says. Zoom in: In a new study, researchers combined machine learning, theories and calculations of physical properties and experiments to identify new alloys.
Unlabelled Sample Compression Schemes for Intersection-Closed Classes and Extremal Classes
Rubinstein, J. Hyam, Rubinstein, Benjamin I. P.
The sample compressibility of concept classes plays an important role in learning theory, as a sufficient condition for PAC learnability, and more recently as an avenue for robust generalisation in adaptive data analysis. Whether compression schemes of size $O(d)$ must necessarily exist for all classes of VC dimension $d$ is unknown, but conjectured to be true by Warmuth. Recently Chalopin, Chepoi, Moran, and Warmuth (2018) gave a beautiful unlabelled sample compression scheme of size VC dimension for all maximum classes: classes that meet the Sauer-Shelah-Perles Lemma with equality. They also offered a counterexample to compression schemes based on a promising approach known as corner peeling. In this paper we simplify and extend their proof technique to deal with so-called extremal classes of VC dimension $d$ which contain maximum classes of VC dimension $d-1$. A criterion is given which would imply that all extremal classes admit unlabelled compression schemes of size $d$. We also prove that all intersection-closed classes with VC dimension $d$ admit unlabelled compression schemes of size at most $11d$.
Do you pay for Privacy in Online learning?
Sanyal, Amartya, Ramponi, Giorgia
Online learning, in the mistake bound model, is one of the most fundamental concepts in learning theory. Differential privacy, instead, is the most widely used statistical concept of privacy in the machine learning community. It is thus clear that defining learning problems that are online differentially privately learnable is of great interest. In this paper, we pose the question on if the two problems are equivalent from a learning perspective, i.e., is privacy for free in the online learning framework?
A Survey of Methods for Automated Algorithm Configuration
Schede, Elias (a:1:{s:5:"en_US";s:20:"Bielefeld University";}) | Brandt, Jasmin (Department of Computer Science, Paderborn University) | Tornede, Alexander ( Department of Computer Science, Paderborn University,) | Wever, Marcel (Institute of Informatics, LMU Munich) | Bengs, Viktor (Institute of Informatics, LMU Munich) | Hรผllermeier, Eyke (Institute of Informatics, LMU Munich) | Tierney, Kevin (Decision and Operation Technologies Group, Bielefeld University)
Algorithm configuration (AC) is concerned with the automated search of the most suitable parameter configuration of a parametrized algorithm. There is currently a wide variety of AC problem variants and methods proposed in the literature. Existing reviews do not take into account all derivatives of the AC problem, nor do they offer a complete classification scheme. To this end, we introduce taxonomies to describe the AC problem and features of configuration methods, respectively. We review existing AC literature within the lens of our taxonomies, outline relevant design choices of configuration approaches, contrast methods and problem variants against each other, and describe the state of AC in industry. Finally, our review provides researchers and practitioners with a look at future research directions in the field of AC.
Estimating the hardness of SAT encodings for Logical Equivalence Checking of Boolean circuits
Semenov, Alexander, Chukharev, Konstantin, Tarasov, Egor, Chivilikhin, Daniil, Kondratiev, Viktor
In this paper we investigate how to estimate the hardness of Boolean satisfiability (SAT) encodings for the Logical Equivalence Checking problem (LEC). Meaningful estimates of hardness are important in cases when a conventional SAT solver cannot solve a SAT instance in a reasonable time. We show that the hardness of SAT encodings for LEC instances can be estimated \textit{w.r.t.} some SAT partitioning. We also demonstrate the dependence of the accuracy of the resulting estimates on the probabilistic characteristics of a specially defined random variable associated with the considered partitioning. The paper proposes several methods for constructing partitionings, which, when used in practice, allow one to estimate the hardness of SAT encodings for LEC with good accuracy. In the experimental part we propose a class of scalable LEC tests that give extremely complex instances with a relatively small input size $n$ of the considered circuits. For example, for $n = 40$, none of the state-of-the-art SAT solvers can cope with the considered tests in a reasonable time. However, these tests can be solved in parallel using the proposed partitioning methods.
Approximate Description Length, Covering Numbers, and VC Dimension
Daniely, Amit, Katzhendler, Gal
Neural Networks are a widely used tool nowadays, despite the lack of theoretical background supporting their abilities to generalize well. Classical notions of learning guarantee generalization only if there are more examples that parameters. It is clear that a stronger assumption is needed to achieve tighter bounds, and indeed, different types of assumptions were used in order to fill this empirical-theoretical gap, including assumptions on robustness to noise [2], bias of the learning algorithm [5, 10], and norm bounds on the weight's matrices [8, 9] The idea of Approximate Description Length [4] was conceived as a part of the line of research working under assumptions that bound the magnitude of the network's weight matrices.
Non-monotonic Resource Utilization in the Bandits with Knapsacks Problem
Kumar, Raunak, Kleinberg, Robert
Bandits with knapsacks (BwK) is an influential model of sequential decision-making under uncertainty that incorporates resource consumption constraints. In each round, the decision-maker observes an outcome consisting of a reward and a vector of nonnegative resource consumptions, and the budget of each resource is decremented by its consumption. In this paper we introduce a natural generalization of the stochastic BwK problem that allows non-monotonic resource utilization. In each round, the decision-maker observes an outcome consisting of a reward and a vector of resource drifts that can be positive, negative or zero, and the budget of each resource is incremented by its drift. Our main result is a Markov decision process (MDP) policy that has constant regret against a linear programming (LP) relaxation when the decision-maker knows the true outcome distributions. We build upon this to develop a learning algorithm that has logarithmic regret against the same LP relaxation when the decision-maker does not know the true outcome distributions. We also present a reduction from BwK to our model that shows our regret bound matches existing results.
Analyzing Robustness of Angluin's L* Algorithm in Presence of Noise
Khmelnitsky, Igor, Haddad, Serge, Ye, Lina, Barbot, Benoรฎt, Bollig, Benedikt, Leucker, Martin, Neider, Daniel, Roy, Rajarshi
Angluin's L* algorithm learns the minimal (complete) deterministic finite automaton (DFA) of a regular language using membership and equivalence queries. Its probabilistic approximatively correct (PAC) version substitutes an equivalence query by a large enough set of random membership queries to get a high level confidence to the answer. Thus it can be applied to any kind of (also non-regular) device and may be viewed as an algorithm for synthesizing an automaton abstracting the behavior of the device based on observations. Here we are interested on how Angluin's PAC learning algorithm behaves for devices which are obtained from a DFA by introducing some noise. More precisely we study whether Angluin's algorithm reduces the noise and produces a DFA closer to the original one than the noisy device. We propose several ways to introduce the noise: (1) the noisy device inverts the classification of words w.r.t. the DFA with a small probability, (2) the noisy device modifies with a small probability the letters of the word before asking its classification w.r.t. the DFA, and (3) the noisy device combines the classification of a word w.r.t. the DFA and its classification w.r.t. a counter automaton. Our experiments were performed on several hundred DFAs. Our main contributions, bluntly stated, consist in showing that: (1) Angluin's algorithm behaves well whenever the noisy device is produced by a random process, (2) but poorly with a structured noise, and, that (3) almost surely randomness yields systems with non-recursively enumerable languages.
On PAC Learning Halfspaces in Non-interactive Local Privacy Model with Public Unlabeled Data
Su, Jinyan, Xu, Jinhui, Wang, Di
In this paper, we study the problem of PAC learning halfspaces in the non-interactive local differential privacy model (NLDP). To breach the barrier of exponential sample complexity, previous results studied a relaxed setting where the server has access to some additional public but unlabeled data. We continue in this direction. Specifically, we consider the problem under the standard setting instead of the large margin setting studied before. Under different mild assumptions on the underlying data distribution, we propose two approaches that are based on the Massart noise model and self-supervised learning and show that it is possible to achieve sample complexities that are only linear in the dimension and polynomial in other terms for both private and public data, which significantly improve the previous results. Our methods could also be used for other private PAC learning problems.
Expected Worst Case Regret via Stochastic Sequential Covering
Wu, Changlong, Heidari, Mohsen, Grama, Ananth, Szpankowski, Wojciech
We study the problem of sequential prediction and online minimax regret with stochastically generated features under a general loss function. We introduce a notion of expected worst case minimax regret that generalizes and encompasses prior known minimax regrets. For such minimax regrets we establish tight upper bounds via a novel concept of stochastic global sequential covering. We show that for a hypothesis class of VC-dimension $\mathsf{VC}$ and $i.i.d.$ generated features of length $T$, the cardinality of the stochastic global sequential covering can be upper bounded with high probability (whp) by $e^{O(\mathsf{VC} \cdot \log^2 T)}$. We then improve this bound by introducing a new complexity measure called the Star-Littlestone dimension, and show that classes with Star-Littlestone dimension $\mathsf{SL}$ admit a stochastic global sequential covering of order $e^{O(\mathsf{SL} \cdot \log T)}$. We further establish upper bounds for real valued classes with finite fat-shattering numbers. Finally, by applying information-theoretic tools of the fixed design minimax regrets, we provide lower bounds for the expected worst case minimax regret. We demonstrate the effectiveness of our approach by establishing tight bounds on the expected worst case minimax regrets for logarithmic loss and general mixable losses.