Goto

Collaborating Authors

 Computational Learning Theory


The Power of Comparisons for Actively Learning Linear Classifiers

Neural Information Processing Systems

In the world of big data, large but costly to label datasets dominate many fields. Active learning, a semi-supervised alternative to the standard PAC-learning model, was introduced to explore whether adaptive labeling could learn concepts with exponentially fewer labeled samples. While previous results show that active learning performs no better than its supervised alternative for important concept classes such as linear separators, we show that by adding weak distributional assumptions and allowing comparison queries, active learning requires exponentially fewer samples. Further, we show that these results hold as well for a stronger model of learning called Reliable and Probably Useful (RPU) learning. In this model, our learner is not allowed to make mistakes, but may instead answer "I don't know." While previous negative results showed this model to have intractably large sample complexity for label queries, we show that comparison queries make RPU-learning at worst logarithmically more expensive in both the passive and active regimes.


Network-to-Network Regularization: Enforcing Occam's Razor to Improve Generalization

Neural Information Processing Systems

What makes a classifier have the ability to generalize? There have been a lot of important attempts to address this question, but a clear answer is still elusive. Proponents of complexity theory find that the complexity of the classifier's function space is key to deciding generalization, whereas other recent work reveals that classifiers which extract invariant feature representations are likely to generalize better. Recent theoretical and empirical studies, however, have shown that even within a classifier's function space, there can be significant differences in the ability to generalize. Specifically, empirical studies have shown that among functions which have a good training data fit, functions with lower Kolmogorov complexity (KC) are likely to generalize better, while the opposite is true for functions of higher KC.


Preference-Based Batch and Sequential Teaching: Towards a Unified View of Models

Neural Information Processing Systems

Algorithmic machine teaching studies the interaction between a teacher and a learner where the teacher selects labeled examples aiming at teaching a target hypothesis.



Reviews: Covariate-Powered Empirical Bayes Estimation

Neural Information Processing Systems

This theory paper provides a number of novel results, including theoretical analysis of minimax bounds and an empirical analysis, for combinations of relatively simple statistical estimators and machine learning models of covariate information. The paper shows that these combinations improve on both the simple estimator alone and the machine learning model alone. The main concern raised by the reviewers is that the paper provides limited empirical validation. I disagree with this assessment, as the paper should be seen as a machine learning theory paper. As the proposed framework includes a number of advanced machine learning models, including XGBoost it should be very relevant for the NeurIPS community.


A Game-Theoretic Analysis of the Empirical Revenue Maximization Algorithm with Endogenous Sampling

Neural Information Processing Systems

The Empirical Revenue Maximization (ERM) is one of the most important price learning algorithms in auction design: as the literature shows it can learn approximately optimal reserve prices for revenue-maximizing auctioneers in both repeated auctions and uniform-price auctions. However, in these applications the agents who provide inputs to ERM have incentives to manipulate the inputs to lower the outputted price. We generalize the definition of an incentive-awareness measure proposed by Lavi et al (2019), to quantify the reduction of ERM's outputted price due to a change of m 1 out of N input samples, and provide specific convergence rates of this measure to zero as N goes to infinity for different types of input distributions. By adopting this measure, we construct an efficient, approximately incentive-compatible, and revenue-optimal learning algorithm using ERM in repeated auctions against non-myopic bidders, and show approximate group incentive-compatibility in uniform-price auctions.


Lazy and Fast Greedy MAP Inference for Determinantal Point Process

Neural Information Processing Systems

The maximum a posteriori (MAP) inference for determinantal point processes (DPPs) is crucial for selecting diverse items in many machine learning applications. Although DPP MAP inference is NP-hard, the greedy algorithm often finds highquality solutions, and many researchers have studied its efficient implementation. One classical and practical method is the lazy greedy algorithm, which is applicable to general submodular function maximization, while a recent fast greedy algorithm based on the Cholesky factorization is more efficient for DPP MAP inference. This paper presents how to combine the ideas of "lazy" and "fast", which have been considered incompatible in the literature. Our lazy and fast greedy algorithm achieves almost the same time complexity as the current best one and runs faster in practice. The idea of "lazy + fast" is extendable to other greedy-type algorithms. We also give a fast version of the double greedy algorithm for unconstrained DPP MAP inference.


Reviews: Distribution-Independent PAC Learning of Halfspaces with Massart Noise

Neural Information Processing Systems

The paper gives a PAC learning algorithm for the basic problem of halfspaces in a model of learning with noise. The algorithm uses ideas from previous related results in the simpler model of random classification noise, with important new ideas. Learning with noise is a basic topic in learning theory. It can be argued that the most studied models (random misclassification noise and malicious noise) are unrealistically benign (even though the related SQ model is very important) or malicious, and there is a great need for the study of more realistic models. The Massart noise model is a candidate for such a model.



Reviews: Distribution-Independent PAC Learning of Halfspaces with Massart Noise

Neural Information Processing Systems

This is a very strong paper that makes impressive progress on the long-standing open problem of efficiently PAC learning halfspaces under the Massart noise model. While resolving the problem would involve getting within epsilon of the optimal error, achieving eta epsilon is a breakthrough and likely will fuel future results in learning theory.