Goto

Collaborating Authors

 Computational Learning Theory


When the bubble bursts… « Machine Learning (Theory)

#artificialintelligence

We are clearly not in a steady-state situation. Is this a bubble or a revolution? The answer surely includes a bit of revolution--the fields of vision and speech recognition have been turned over by great empirical successes created by deep neural architectures and more generally machine learning has found plentiful real-world uses. At the same time, I find it hard to believe that we aren't living in a bubble. There was an AI bubble in the 1980s (before my time), a techbubble around 2000, and we seem to have a combined AI/tech bubble going on right now.


Constrained Counting and Sampling: Bridging the Gap between Theory and Practice

arXiv.org Artificial Intelligence

Constrained counting and sampling are two fundamental problems in Computer Science with numerous applications, including network reliability, privacy, probabilistic reasoning, and constrained-random verification. In constrained counting, the task is to compute the total weight, subject to a given weighting function, of the set of solutions of the given constraints. In constrained sampling, the task is to sample randomly, subject to a given weighting function, from the set of solutions to a set of given constraints. Consequently, constrained counting and sampling have been subject to intense theoretical and empirical investigations over the years. Prior work, however, offered either heuristic techniques with poor guarantees of accuracy or approaches with proven guarantees but poor performance in practice. In this thesis, we introduce a novel hashing-based algorithmic framework for constrained sampling and counting that combines the classical algorithmic technique of universal hashing with the dramatic progress made in combinatorial reasoning tools, in particular, SAT and SMT, over the past two decades. The resulting frameworks for counting (ApproxMC2) and sampling (UniGen) can handle formulas with up to million variables representing a significant boost up from the prior state of the art tools' capability to handle few hundreds of variables. If the initial set of constraints is expressed as Disjunctive Normal Form (DNF), ApproxMC is the only known Fully Polynomial Randomized Approximation Scheme (FPRAS) that does not involve Monte Carlo steps. By exploiting the connection between definability of formulas and variance of the distribution of solutions in a cell defined by 3-universal hash functions, we introduced an algorithmic technique, MIS, that reduced the size of XOR constraints employed in the underlying universal hash functions by as much as two orders of magnitude.


PAC-learning in the presence of evasion adversaries

arXiv.org Machine Learning

The existence of evasion attacks during the test phase of machine learning algorithms represents a significant challenge to both their deployment and understanding. These attacks can be carried out by adding imperceptible perturbations to inputs to generate adversarial examples and finding effective defenses and detectors has proven to be difficult. In this paper, we step away from the attack-defense arms race and seek to understand the limits of what can be learned in the presence of an evasion adversary. In particular, we extend the Probably Approximately Correct (PAC)-learning framework to account for the presence of an adversary. We first define corrupted hypothesis classes which arise from standard binary hypothesis classes in the presence of an evasion adversary and derive the Vapnik-Chervonenkis (VC)-dimension for these, denoted as the adversarial VC-dimension. We then show that sample complexity upper bounds from the Fundamental Theorem of Statistical learning can be extended to the case of evasion adversaries, where the sample complexity is controlled by the adversarial VC-dimension. We then explicitly derive the adversarial VC-dimension for halfspace classifiers in the presence of a sample-wise norm-constrained adversary of the type commonly studied for evasion attacks and show that it is the same as the standard VC-dimension, closing an open question. Finally, we prove that the adversarial VC-dimension can be either larger or smaller than the standard VC-dimension depending on the hypothesis class and adversary, making it an interesting object of study in its own right.


Private PAC learning implies finite Littlestone dimension

arXiv.org Machine Learning

We show that every approximately differentially private learning algorithm (possibly improper) for a class $H$ with Littlestone dimension~$d$ requires $\Omega\bigl(\log^*(d)\bigr)$ examples. As a corollary it follows that the class of thresholds over $\mathbb{N}$ can not be learned in a private manner; this resolves an open question due to [Bun et al. FOCS '15]. We leave as an open question whether every class with a finite Littlestone dimension can be learned by an approximately differentially private algorithm.


Learning convex polytopes with margin

arXiv.org Machine Learning

We present a near-optimal algorithm for properly learning convex polytopes in the realizable PAC setting from data with a margin. Our first contribution is to identify distinct generalizations of the notion of {\em margin} from hyperplanes to polytopes and to understand how they relate geometrically; this result may be of interest beyond the learning setting. Our novel learning algorithm constructs a consistent polytope as an intersection of about $t \log t$ halfspaces in time polynomial in $t$ (where $t$ is the number of halfspaces forming an optimal polytope). This is an exponential improvement over the state of the art [Arriaga and Vempala, 2006]. We also improve over the super-polynomial-in-$t$ algorithm of Klivans and Servedio [2008], while achieving a better sample complexity. Finally, we provide the first nearly matching hardness-of-approximation lower bound, whence our claim of near optimality.


Tight Bounds for Collaborative PAC Learning via Multiplicative Weights

arXiv.org Machine Learning

We study the collaborative PAC learning problem recently proposed in Blum et al.~\cite{BHPQ17}, in which we have $k$ players and they want to learn a target function collaboratively, such that the learned function approximates the target function well on all players' distributions simultaneously. The quality of the collaborative learning algorithm is measured by the ratio between the sample complexity of the algorithm and that of the learning algorithm for a single distribution (called the overhead). We obtain a collaborative learning algorithm with overhead $O(\ln k)$, improving the one with overhead $O(\ln^2 k)$ in \cite{BHPQ17}. We also show that an $\Omega(\ln k)$ overhead is inevitable when $k$ is polynomial bounded by the VC dimension of the hypothesis class. Finally, our experimental study has demonstrated the superiority of our algorithm compared with the one in Blum et al. on real-world datasets.


Improved Algorithms for Collaborative PAC Learning

arXiv.org Machine Learning

We study a recent model of collaborative PAC learning where $k$ players with $k$ different tasks collaborate to learn a single classifier that works for all tasks. Previous work showed that when there is a classifier that has very small error on all tasks, there is a collaborative algorithm that finds a single classifier for all tasks and it uses $O(\ln^2 (k))$ times the sample complexity to learn a single task. In this work, we design new algorithms for both the realizable and the non-realizable settings using only $O(\ln (k))$ times the sample complexity to learn a single task. The sample complexity upper bounds of our algorithms match previous lower bounds and in some range of parameters are even better than previous algorithms that are allowed to output different classifiers for different tasks.


Sample Compression for Real-Valued Learners

arXiv.org Machine Learning

We give an algorithmically efficient version of the learner-to-compression scheme conversion in Moran and Yehudayoff (2016). In extending this technique to real-valued hypotheses, we also obtain an efficient regression-to-bounded sample compression converter. To our knowledge, this is the first general compressed regression result (regardless of efficiency or boundedness) guaranteeing uniform approximate reconstruction. Along the way, we develop a generic procedure for constructing weak real-valued learners out of abstract regressors; this may be of independent interest. In particular, this result sheds new light on an open question of H. Simon (1997). We show applications to two regression problems: learning Lipschitz and bounded-variation functions.


Do Outliers Ruin Collaboration?

arXiv.org Machine Learning

Consider the following real-world scenario: we would like to train a speech recognition model based on labeled examples collected from different users. For this particular application, a high average accuracy over all users is far from satisfactory: a model that is correct on 99.9% of the data may still go seriously wrong for a small yet non-negligible 0.1% fraction of the users. Instead, a more desirable objective would be finding personalized speech recognition solutions that are accurate for every single user. There are two major challenges to achieving this goal, the first being user heterogeneity: a model trained exclusively for users with a particular accent may fail miserably for users from another region. This challenge hints that a successful learning algorithm should be adaptive: more samples need to be collected from users with atypical data distributions. Equally crucial is that a small fraction of the users are malicious (e.g., they are controlled by a competing corporation); these users intend to mislead the speech recognition model into generating inaccurate or even ludicrous outputs. Motivated by these practical concerns, we propose the Robust Collaborative Learning model and study from a theoretical perspective the complexity of learning in the presence of untrusted collaborators.


How Did You Benefit from Machine Learning Today?

#artificialintelligence

In an earlier blog, we talked about how machine learning is used in social media analytics. In this post, we're going to review machine learning (ML) basics and examples, and explore some of the areas you might be unaware of where ML is having a significant impact. "Evolved from the study of pattern recognition and computational learning theory in artificial intelligence, machine learning explores the study and construction of algorithms that can learn from and make predictions on data." The goal of ML is pretty simple -- teach computer systems to perform a task. The computer system gains experience by observing patterns from examples rather than being programmed with explicit instructions or rules.