Goto

Collaborating Authors

 Computational Learning Theory


Mistake Bounds for Binary Matrix Completion

Neural Information Processing Systems

We study the problem of completing a binary matrix in an online learning setting. On each trial we predict a matrix entry and then receive the true entry. We propose a Matrix Exponentiated Gradient algorithm [1] to solve this problem. We provide a mistake bound for the algorithm, which scales with the margin complexity [2, 3] of the underlying matrix. The bound suggests an interpretation where each row of the matrix is a prediction task over a finite set of objects, the columns. Using this we show that the algorithm makes a number of mistakes which is comparable up to a logarithmic factor to the number of mistakes made by the Kernel Perceptron with an optimal kernel in hindsight. We discuss applications of the algorithm to predicting as well as the best biclustering and to the problem of predicting the labeling of a graph without knowing the graph in advance.


Reviews: Interaction Screening: Efficient and Sample-Optimal Learning of Ising Models

Neural Information Processing Systems

The main effort is to improve upon the recent results provided by Bresler, showing that the complexity of identifying the structure of max degree d Ising model is polynomial in p and independent of d. Strong Points: 1) The timeliness of the topic in this paper is good, meaning that there is currently ongoing interest and work on Ising model reconstruction. Weak points: 1) The whole approach is based on the introduction of the ISO. This is the main trick in the proposed approach. Other than that, the rest of the method and its analysis are usual and well studied (l_1-penalization and connection with the tutorial by Negahban et.



On the Recursive Teaching Dimension of VC Classes

Neural Information Processing Systems

In this paper, we study the quantitative relation between RTD and the well-known learning complexity measure VC dimension (VCD), and improve the best known upper and (worst-case) lower bounds on the recursive teaching dimension with respect to the VC dimension.


Multi-step learning and underlying structure in statistical models

Neural Information Processing Systems

In multi-step learning, where a final learning task is accomplished via a sequence of intermediate learning tasks, the intuition is that successive steps or levels transform the initial data into representations more and more "suited" to the final learning task. A related principle arises in transfer-learning where Baxter (2000) proposed a theoretical framework to study how learning multiple tasks transforms the inductive bias of a learner. The most widespread multi-step learning approach is semisupervised learning with two steps: unsupervised, then supervised. Several authors (Castelli-Cover, 1996; Balcan-Blum, 2005; Niyogi, 2008; Ben-David et al, 2008; Urner et al, 2011) have analyzed SSL, with Balcan-Blum (2005) proposing a version of the PAC learning framework augmented by a "compatibility function" to link concept class and unlabeled data distribution. We propose to analyze SSL and other multi-step learning approaches, much in the spirit of Baxter's framework, by defining a learning problem generatively as a joint statistical model on X Y.


Optimal Learners for Realizable Regression: PAC Learning and Online Learning

Neural Information Processing Systems

In this work, we aim to characterize the statistical complexity of realizable regression both in the PAC learning setting and the online learning setting. Previous work had established the sufficiency of finiteness of the fat shattering dimension for PAC learnability and the necessity of finiteness of the scaled Natarajan dimension, but little progress had been made towards a more complete characterization since the work of Simon 1997 (SICOMP '97). To this end, we first introduce a minimax instance optimal learner for realizable regression and propose a novel dimension that both qualitatively and quantitatively characterizes which classes of real-valued predictors are learnable. We then identify a combinatorial dimension related to the graph dimension that characterizes ERM learnability in the realizable setting. Finally, we establish a necessary condition for learnability based on a combinatorial dimension related to the DS dimension, and conjecture that it may also be sufficient in this context.


Is Out-of-Distribution Detection Learnable?

Neural Information Processing Systems

Supervised learning aims to train a classifier under the assumption that training and test data are from the same distribution. To ease the above assumption, researchers have studied a more realistic setting: out-of-distribution (OOD) detection, where test data may come from classes that are unknown during training (i.e., OOD data). Due to the unavailability and diversity of OOD data, good generalization ability is crucial for effective OOD detection algorithms. To study the generalization of OOD detection, in this paper, we investigate the probably approximately correct (PAC) learning theory of OOD detection, which is proposed by researchers as an open problem. First, we find a necessary condition for the learnability of OOD detection.


Towards Practical Few-shot Query Sets: Transductive Minimum Description Length Inference

Neural Information Processing Systems

Standard few-shot benchmarks are often built upon simplifying assumptions on the query sets, which may not always hold in practice. In particular, for each task at testing time, the classes effectively present in the unlabeled query set are known a priori, and correspond exactly to the set of classes represented in the labeled support set. We relax these assumptions and extend current benchmarks, so that the query-set classes of a given task are unknown, but just belong to a much larger set of possible classes. Our setting could be viewed as an instance of the challenging yet practical problem of extremely imbalanced K -way classification, K being much larger than the values typically used in standard benchmarks, and with potentially irrelevant supervision from the support set. Motivated by these observations, we introduce a \textbf{P}rim\textbf{A}l \textbf{D}ual Minimum \textbf{D}escription \textbf{LE}ngth (\textbf{PADDLE}) formulation, which balances data-fitting accuracy and model complexity for a given few-shot task, under supervision constraints from the support set.


Multiclass versus Binary Differentially Private PAC Learning

Neural Information Processing Systems

We show a generic reduction from multiclass differentially private PAC learning to binary private PAC learning. We apply this transformation to a recently proposed binary private PAC learner to obtain a private multiclass learner with sample complexity that has a polynomial dependence on the multiclass Littlestone dimension and a poly-logarithmic dependence on the number of classes. This yields a doubly exponential improvement in the dependence on both parameters over learners from previous work. Our proof extends the notion of \Psi -dimension defined in work of Ben-David et al. [JCSS, 1995] to the online setting and explores its general properties.


On Optimal Learning Under Targeted Data Poisoning

Neural Information Processing Systems

Consider the task of learning a hypothesis class \mathcal{H} in the presence of an adversary that can replace up to an \eta fraction of the examples in the training set with arbitrary adversarial examples. The adversary aims to fail the learner on a particular target test point x which is \emph{known} to the adversary but not to the learner. In this work we aim to characterize the smallest achievable error \epsilon \epsilon(\eta) by the learner in the presence of such an adversary in both realizable and agnostic settings. Remarkably, we show that the upper bound can be attained by a deterministic learner. In the agnostic setting we reveal a more elaborate landscape: we devise a deterministic learner with a multiplicative regret guarantee of \epsilon \leq C\cdot\mathtt{OPT} O(\mathtt{VC}(\mathcal{H})\cdot \eta), where C 1 is a universal numerical constant.