Collaborating Authors

Computational Learning Theory

A Gentle Introduction to Computational Learning Theory


Computational learning theory, or statistical learning theory, refers to mathematical frameworks for quantifying learning tasks and algorithms. These are sub-fields of machine learning that a machine learning practitioner does not need to know in great depth in order to achieve good results on a wide range of problems. Nevertheless, it is a sub-field where having a high-level understanding of some of the more prominent methods may provide insight into the broader task of learning from data. In this post, you will discover a gentle introduction to computational learning theory for machine learning. A Gentle Introduction to Computational Learning Theory Photo by someone10x, some rights reserved.

Departmental Teaching awards announced

Oxford Comp Sci

Prof. Gavin Lowe Excellent student feedback for teaching Concurrent Programming and Concurrency (with the latter moved to remote teaching during TT 2020).

Technical Perspective: Algorithm Selection as a Learning Problem

Communications of the ACM

The following paper by Gupta and Roughgarden--"Data-Driven Algorithm Design"--addresses the issue that the best algorithm to use for many problems depends on what the input "looks like." Certain algorithms work better for certain types of inputs, whereas other algorithms work better for others. This is especially the case for NP-hard problems, where we do not expect to ever have algorithms that work well on all inputs: instead, we often have various heuristics that each work better in different settings. Moreover, heuristic strategies often have parameters or hyperparameters that must be set in some way. The authors present a theoretical formulation and analysis of algorithm selection using the well-developed framework of PAC-learning to analyze fundamental learning questions.

Data-Driven Algorithm Design

Communications of the ACM

The best algorithm for a computational problem generally depends on the "relevant inputs," a concept that depends on the application domain and often defies formal articulation. Although there is a large literature on empirical approaches to selecting the best algorithm for a given application domain, there has been surprisingly little theoretical analysis of the problem. Our framework captures several state-of-the-art empirical and theoretical approaches to the problem, and our results identify conditions under which these approaches are guaranteed to perform well. We interpret our results in the contexts of learning greedy heuristics, instance feature-based algorithm selection, and parameter tuning in machine learning. Rigorously comparing algorithms is hard. Two different algorithms for a computational problem generally have incomparable performance: one algorithm is better on some inputs but worse on the others. The simplest and most common solution in the theoretical analysis of algorithms is to summarize the performance of an algorithm using a single number, such as its worst-case performance or its average-case performance with respect to an input distribution. This approach effectively advocates using the algorithm with the best summarizing value (e.g., the smallest worst-case running time). Solving a problem "in practice" generally means identifying an algorithm that works well for most or all instances of interest. When the "instances of interest" are easy to specify formally in advance--say, planar graphs, the traditional analysis approaches often give accurate performance predictions and identify useful algorithms.

Machine Learning Theory - Underfitting vs Overfitting


Sign in to report inappropriate content. Oversimplifying the problem Does not do well in the training set Error due to bias What is Overfitting? High Variance complicates the problem more than necessary to perform well on the training set.

Point Location and Active Learning: Learning Halfspaces Almost Optimally Machine Learning

Given a finite set $X \subset \mathbb{R}^d$ and a binary linear classifier $c: \mathbb{R}^d \to \{0,1\}$, how many queries of the form $c(x)$ are required to learn the label of every point in $X$? Known as \textit{point location}, this problem has inspired over 35 years of research in the pursuit of an optimal algorithm. Building on the prior work of Kane, Lovett, and Moran (ICALP 2018), we provide the first nearly optimal solution, a randomized linear decision tree of depth $\tilde{O}(d\log(|X|))$, improving on the previous best of $\tilde{O}(d^2\log(|X|))$ from Ezra and Sharir (Discrete and Computational Geometry, 2019). As a corollary, we also provide the first nearly optimal algorithm for actively learning halfspaces in the membership query model. En route to these results, we prove a novel characterization of Barthe's Theorem (Inventiones Mathematicae, 1998) of independent interest. In particular, we show that $X$ may be transformed into approximate isotropic position if and only if there exists no $k$-dimensional subspace with more than a $k/d$-fraction of $X$, and provide a similar characterization for exact isotropic position.


Communications of the ACM

I know thou wilt say "ay," And I will take thy word. Yet if thou swear'st Thou mayst prove false. At lovers' perjuries, They say, Jove laughs.

Private Query Release Assisted by Public Data Machine Learning

We study the problem of differentially private query release assisted by access to public data. In this problem, the goal is to answer a large class $\mathcal{H}$ of statistical queries with error no more than $\alpha$ using a combination of public and private samples. The algorithm is required to satisfy differential privacy only with respect to the private samples. We study the limits of this task in terms of the private and public sample complexities. First, we show that we can solve the problem for any query class $\mathcal{H}$ of finite VC-dimension using only $d/\alpha$ public samples and $\sqrt{p}d^{3/2}/\alpha^2$ private samples, where $d$ and $p$ are the VC-dimension and dual VC-dimension of $\mathcal{H}$, respectively. In comparison, with only private samples, this problem cannot be solved even for simple query classes with VC-dimension one, and without any private samples, a larger public sample of size $d/\alpha^2$ is needed. Next, we give sample complexity lower bounds that exhibit tight dependence on $p$ and $\alpha$. For the class of decision stumps, we give a lower bound of $\sqrt{p}/\alpha$ on the private sample complexity whenever the public sample size is less than $1/\alpha^2$. Given our upper bounds, this shows that the dependence on $\sqrt{p}$ is necessary in the private sample complexity. We also give a lower bound of $1/\alpha$ on the public sample complexity for a broad family of query classes, which by our upper bound, is tight in $\alpha$.

Diffusion Map for Manifold Learning, Theory and Implementation - KDnuggets


'Curse of dimensionality' is a well-known problem in Data Science, which often causes poor performance, inaccurate results, and, most importantly, a similarity measure break-down. The primary cause of this is because high dimensional datasets are typically sparse, and often a lower-dimensional structure or'Manifold' would embed this data. So there is a non-linear relationship among the variables (or features or dimensions), which we need to learn to compute better similarity. Manifold learning is an approach to non-linear dimensionality reduction. The basis for algorithms in manifold learning is that the dimensionality of many data sets is only artificially high 1.

Optimal No-regret Learning in Repeated First-price Auctions Machine Learning

We study online learning in repeated first-price auctions with censored feedback, where a bidder, only observing the winning bid at the end of each auction, learns to adaptively bid in order to maximize her cumulative payoff. To achieve this goal, the bidder faces a challenging dilemma: if she wins the bid--the only way to achieve positive payoffs--then she is not able to observe the highest bid of the other bidders, which we assume is iid drawn from an unknown distribution. This dilemma, despite being reminiscent of the exploration-exploitation trade-off in contextual bandits, cannot directly be addressed by the existing UCB or Thompson sampling algorithms in that literature, mainly because contrary to the standard bandits setting, when a positive reward is obtained here, nothing about the environment can be learned. In this paper, by exploiting the structural properties of first-price auctions, we develop the first learning algorithm that achieves $O(\sqrt{T}\log^2 T)$ regret bound when the bidder's private values are stochastically generated. We do so by providing an algorithm on a general class of problems, which we call monotone group contextual bandits, where the same regret bound is established under stochastically generated contexts. Further, by a novel lower bound argument, we characterize an $\Omega(T^{2/3})$ lower bound for the case where the contexts are adversarially generated, thus highlighting the impact of the contexts generation mechanism on the fundamental learning limit. Despite this, we further exploit the structure of first-price auctions and develop a learning algorithm that operates sample-efficiently (and computationally efficiently) in the presence of adversarially generated private values. We establish an $O(\sqrt{T}\log^5 T)$ regret bound for this algorithm, hence providing a complete characterization of optimal learning guarantees for this problem.