Computational Learning Theory
Adversarially Robust Learning: A Generic Minimax Optimal Learner and Characterization
Montasser, Omar, Hanneke, Steve, Srebro, Nathan
We present a minimax optimal learner for the problem of learning predictors robust to adversarial examples at test-time. Interestingly, we find that this requires new algorithmic ideas and approaches to adversarially robust learning. In particular, we show, in a strong negative sense, the suboptimality of the robust learner proposed by Montasser, Hanneke, and Srebro (2019) and a broader family of learners we identify as local learners. Our results are enabled by adopting a global perspective, specifically, through a key technical contribution: the global one-inclusion graph, which may be of independent interest, that generalizes the classical one-inclusion graph due to Haussler, Littlestone, and Warmuth (1994). Finally, as a byproduct, we identify a dimension characterizing qualitatively and quantitatively what classes of predictors $\mathcal{H}$ are robustly learnable. This resolves an open problem due to Montasser et al. (2019), and closes a (potentially) infinite gap between the established upper and lower bounds on the sample complexity of adversarially robust learning.
SATViz: Real-Time Visualization of Clausal Proofs
Holzenkamp, Tim, Kuryshev, Kevin, Oltmann, Thomas, Wรคldele, Lucas, Zuber, Johann, Heuer, Tobias, Iser, Markus
Visual representations of algorithms can improve our understanding of algorithmic concepts and the nature of the underlying problems. Visualizations of the Conflict-Driven Clause Learning algorithm (CDCL) are of high interest, as this is currently the most successful algorithm for solving the propositional satisfiability problem (SAT problem). Despite the complexity of the SAT problem, implementations of CDCL in so-called SAT solvers are successfully used in industrial practice, e.g., in software verification [4], hardware verification [12], product configuration [21], or planning [18]. For satisfiable instances, SAT solvers output a variable assignment which serves as a certificate of satisfiability. Such a certificate can be checked in polynomial time and hence SAT NP.
Active Learning of Classifiers with Label and Seed Queries
Bressan, Marco, Cesa-Bianchi, Nicolรฒ, Lattanzi, Silvio, Paudice, Andrea, Thiessen, Maximilian
We study exact active learning of binary and multiclass classifiers with margin. Given an $n$-point set $X \subset \mathbb{R}^m$, we want to learn any unknown classifier on $X$ whose classes have finite strong convex hull margin, a new notion extending the SVM margin. In the standard active learning setting, where only label queries are allowed, learning a classifier with strong convex hull margin $\gamma$ requires in the worst case $\Omega\big(1+\frac{1}{\gamma}\big)^{(m-1)/2}$ queries. On the other hand, using the more powerful seed queries (a variant of equivalence queries), the target classifier could be learned in $O(m \log n)$ queries via Littlestone's Halving algorithm; however, Halving is computationally inefficient. In this work we show that, by carefully combining the two types of queries, a binary classifier can be learned in time $\operatorname{poly}(n+m)$ using only $O(m^2 \log n)$ label queries and $O\big(m \log \frac{m}{\gamma}\big)$ seed queries; the result extends to $k$-class classifiers at the price of a $k!k^2$ multiplicative overhead. Similar results hold when the input points have bounded bit complexity, or when only one class has strong convex hull margin against the rest. We complement the upper bounds by showing that in the worst case any algorithm needs $\Omega\big(k m \log \frac{1}{\gamma}\big)$ seed and label queries to learn a $k$-class classifier with strong convex hull margin $\gamma$.
Multitask Learning via Shared Features: Algorithms and Hardness
Bairaktari, Konstantina, Blanc, Guy, Tan, Li-Yang, Ullman, Jonathan, Zakynthinou, Lydia
We investigate the computational efficiency of multitask learning of Boolean functions over the $d$-dimensional hypercube, that are related by means of a feature representation of size $k \ll d$ shared across all tasks. We present a polynomial time multitask learning algorithm for the concept class of halfspaces with margin $\gamma$, which is based on a simultaneous boosting technique and requires only $\textrm{poly}(k/\gamma)$ samples-per-task and $\textrm{poly}(k\log(d)/\gamma)$ samples in total. In addition, we prove a computational separation, showing that assuming there exists a concept class that cannot be learned in the attribute-efficient model, we can construct another concept class such that can be learned in the attribute-efficient model, but cannot be multitask learned efficiently -- multitask learning this concept class either requires super-polynomial time complexity or a much larger total number of samples.
Algorithms for Discrepancy, Matchings, and Approximations: Fast, Simple, and Practical
Csikรณs, Mรณnika, Mustafa, Nabil H.
We study one of the key tools in data approximation and optimization: low-discrepancy colorings. Formally, given a finite set system $(X,\mathcal S)$, the \emph{discrepancy} of a two-coloring $\chi:X\to\{-1,1\}$ is defined as $\max_{S \in \mathcal S}|{\chi(S)}|$, where $\chi(S)=\sum\limits_{x \in S}\chi(x)$. We propose a randomized algorithm which, for any $d>0$ and $(X,\mathcal S)$ with dual shatter function $\pi^*(k)=O(k^d)$, returns a coloring with expected discrepancy $O\left({\sqrt{|X|^{1-1/d}\log|\mathcal S|}}\right)$ (this bound is tight) in time $\tilde O\left({|\mathcal S|\cdot|X|^{1/d}+|X|^{2+1/d}}\right)$, improving upon the previous-best time of $O\left(|\mathcal S|\cdot|X|^3\right)$ by at least a factor of $|X|^{2-1/d}$ when $|\mathcal S|\geq|X|$. This setup includes many geometric classes, families of bounded dual VC-dimension, and others. As an immediate consequence, we obtain an improved algorithm to construct $\varepsilon$-approximations of sub-quadratic size. Our method uses primal-dual reweighing with an improved analysis of randomly updated weights and exploits the structural properties of the set system via matchings with low crossing number -- a fundamental structure in computational geometry. In particular, we get the same $|X|^{2-1/d}$ factor speed-up on the construction time of matchings with crossing number $O\left({|X|^{1-1/d}}\right)$, which is the first improvement since the 1980s. The proposed algorithms are very simple, which makes it possible, for the first time, to compute colorings with near-optimal discrepancies and near-optimal sized approximations for abstract and geometric set systems in dimensions higher than $2$.
Learning Automata-Based Complex Event Patterns in Answer Set Programming
Katzouris, Nikos, Paliouras, Georgios
Complex Event Recognition and Forecasting (CER/F) techniques attempt to detect, or even forecast ahead of time, event occurrences in streaming input using predefined event patterns. Such patterns are not always known in advance, or they frequently change over time, making machine learning techniques, capable of extracting such patterns from data, highly desirable in CER/F. Since many CER/F systems use symbolic automata to represent such patterns, we propose a family of such automata where the transition-enabling conditions are defined by Answer Set Programming (ASP) rules, and which, thanks to the strong connections of ASP to symbolic learning, are directly learnable from data. We present such a learning approach in ASP and an incremental version thereof that trades optimality for efficiency and is capable to scale to large datasets. We evaluate our approach on two CER datasets and compare it to state-of-the-art automata learning techniques, demonstrating empirically a superior performance, both in terms of predictive accuracy and scalability.
Computational Learning Theory: Second European Conference, EuroCOLT '95, Barcelona, Spain, March 13 - 15, 1995. Proceedings (Lecture Notes in Computer Science, 904): Vitanyi, Paul: 9783540591191: Amazon.com: Books
Computational Learning Theory: Second European Conference, EuroCOLT '95, Barcelona, Spain, March 13 - 15, 1995. Proceedings (Lecture Notes in Computer Science, 904) [Vitanyi, Paul] on Amazon.com. *FREE* shipping on qualifying offers. Computational Learning Theory: Second European Conference, EuroCOLT '95, Barcelona, Spain, March 13 - 15, 1995. Proceedings (Lecture Notes in Computer Science, 904)
PAC-learning gains of Turing machines over circuits and neural networks
Pinon, Brieuc, Jungers, Raphaรซl, Delvenne, Jean-Charles
A caveat to many applications of the current Deep Learning approach is the need for large-scale data. One improvement suggested by Kolmogorov Complexity results is to apply the minimum description length principle with computationally universal models. We study the potential gains in sample efficiency that this approach can bring in principle. We use polynomial-time Turing machines to represent computationally universal models and Boolean circuits to represent Artificial Neural Networks (ANNs) acting on finite-precision digits. Our analysis unravels direct links between our question and Computational Complexity results. We provide lower and upper bounds on the potential gains in sample efficiency between the MDL applied with Turing machines instead of ANNs. Our bounds depend on the bit-size of the input of the Boolean function to be learned. Furthermore, we highlight close relationships between classical open problems in Circuit Complexity and the tightness of these.
The Correlated Arc Orienteering Problem
Agarwal, Saurav, Akella, Srinivas
This paper introduces the correlated arc orienteering problem (CAOP), where the task is to find routes for a team of robots to maximize the collection of rewards associated with features in the environment. These features can be one-dimensional or points in the environment, and can have spatial correlation, i.e., visiting a feature in the environment may provide a portion of the reward associated with a correlated feature. A robot incurs costs as it traverses the environment, and the total cost for its route is limited by a resource constraint such as battery life or operation time. As environments are often large, we permit multiple depots where the robots must start and end their routes. The CAOP generalizes the correlated orienteering problem (COP), where the rewards are only associated with point features, and the arc orienteering problem (AOP), where the rewards are not spatially correlated. We formulate a mixed integer quadratic program (MIQP) that formalizes the problem and gives optimal solutions. However, the problem is NP-hard, and therefore we develop an efficient greedy constructive algorithm. We illustrate the problem with two different applications: informative path planning for methane gas leak detection and coverage of road networks.
Quantum Boosting using Domain-Partitioning Hypotheses
Bera, Debajyoti, Bhatia, Rohan, Chani, Parmeet Singh, Chatterjee, Sagnik
Boosting is an ensemble learning method that converts a weak learner into a strong learner in the PAC learning framework. Freund and Schapire designed the Godel prize-winning algorithm named AdaBoost that can boost learners, which output binary hypotheses. Recently, Arunachalam and Maity presented the first quantum boosting algorithm with similar theoretical guarantees. Their algorithm, which we refer to as QAdaBoost henceforth, is a quantum adaptation of AdaBoost and only works for the binary hypothesis case. QAdaBoost is quadratically faster than AdaBoost in terms of the VC-dimension of the hypothesis class of the weak learner but polynomially worse in the bias of the weak learner. Izdebski et al. posed an open question on whether we can boost quantum weak learners that output non-binary hypothesis. In this work, we address this open question by developing the QRealBoost algorithm which was motivated by the classical RealBoost algorithm. The main technical challenge was to provide provable guarantees for convergence, generalization bounds, and quantum speedup, given that quantum subroutines are noisy and probabilistic. We prove that QRealBoost retains the quadratic speedup of QAdaBoost over AdaBoost and further achieves a polynomial speedup over QAdaBoost in terms of both the bias of the learner and the time taken by the learner to learn the target concept class. Finally, we perform empirical evaluations on QRealBoost and report encouraging observations on quantum simulators by benchmarking the convergence performance of QRealBoost against QAdaBoost, AdaBoost, and RealBoost on a subset of the MNIST dataset and Breast Cancer Wisconsin dataset.