Computational Learning Theory
Five Most Popular Open Source Frameworks Used in Machine Learning Analytics Insight
Machine language a branch of artificial intelligence which enables system the ability to learn from data without being programmed. Machine learning got evolved from pattern recognition and computational learning theory in artificial intelligence. It has revolutionized the conventional way through developing algorithms that can learn and make predictions on data. There are innumerable factors that have improved the contribution of machine learning. Open source frameworks are one of the major reasons for the boost in machine learning.
A Direct Sum Result for the Information Complexity of Learning
Nachum, Ido, Shafer, Jonathan, Yehudayoff, Amir
How many bits of information are required to PAC learn a class of hypotheses of VC dimension $d$? The mathematical setting we follow is that of Bassily et al. (2018), where the value of interest is the mutual information $\mathrm{I}(S;A(S))$ between the input sample $S$ and the hypothesis outputted by the learning algorithm $A$. We introduce a class of functions of VC dimension $d$ over the domain $\mathcal{X}$ with information complexity at least $\Omega\left(d\log \log \frac{|\mathcal{X}|}{d}\right)$ bits for any consistent and proper algorithm (deterministic or random). Bassily et al. proved a similar (but quantitatively weaker) result for the case $d=1$. The above result is in fact a special case of a more general phenomenon we explore. We define the notion of information complexity of a given class of functions $\mathcal{H}$. Intuitively, it is the minimum amount of information that an algorithm for $\mathcal{H}$ must retain about its input to ensure consistency and properness. We prove a direct sum result for information complexity in this context; roughly speaking, the information complexity sums when combining several classes.
Privacy-preserving Prediction
Dwork, Cynthia, Feldman, Vitaly
Ensuring differential privacy of models learned from sensitive user data is an important goal that has been studied extensively in recent years. It is now known that for some basic learning problems, especially those involving high-dimensional data, producing an accurate private model requires much more data than learning without privacy. At the same time, in many applications it is not necessary to expose the model itself. Instead users may be allowed to query the prediction model on their inputs only through an appropriate interface. Here we formulate the problem of ensuring privacy of individual predictions and investigate the overheads required to achieve it in several standard models of classification and regression. We first describe a simple baseline approach based on training several models on disjoint subsets of data and using standard private aggregation techniques to predict. We show that this approach has nearly optimal sample complexity for (realizable) PAC learning of any class of Boolean functions. At the same time, without strong assumptions on the data distribution, the aggregation step introduces a substantial overhead. We demonstrate that this overhead can be avoided for the well-studied class of thresholds on a line and for a number of standard settings of convex regression. The analysis of our algorithm for learning thresholds relies crucially on strong generalization guarantees that we establish for all differentially private prediction algorithms.
Learning Mixtures of Product Distributions via Higher Multilinear Moments
Learning mixtures of $k$ binary product distributions is a central problem in computational learning theory, but one where there are wide gaps between the best known algorithms and lower bounds (even for restricted families of algorithms). We narrow many of these gaps by developing novel insights about how to reason about higher order multilinear moments. Our results include: 1) An $n^{O(k^2)}$ time algorithm for learning mixtures of binary product distributions, giving the first improvement on the $n^{O(k^3)}$ time algorithm of Feldman, O'Donnell and Servedio 2) An $n^{\Omega(\sqrt{k})}$ statistical query lower bound, improving on the $n^{\Omega(\log k)}$ lower bound that is based on connections to sparse parity with noise 3) An $n^{O(\log k)}$ time algorithm for learning mixtures of $k$ subcubes. This special case can still simulate many other hard learning problems, but is much richer than any of them alone. As a corollary, we obtain more flexible algorithms for learning decision trees under the uniform distribution, that work with stochastic transitions, when we are only given positive examples and with a polylogarithmic number of samples for any fixed $k$. Our algorithms are based on a win-win analysis where we either build a basis for the moments or locate a degeneracy that can be used to simplify the problem, which we believe will have applications to other learning problems over discrete domains.
MAOS-QKP: Project Portal – Xiao-Feng Xie, Ph.D.
MAOS-QKP is a multiagent optimization system (MAOS) for solving the Quadratic Knapsack Problem (QKP). Related Information: MAOS-QKP shares the MAOS kernel with other MAOS applications (e.g. Please find other related code and software in our Source Code Library. License information: MAOS-QKP is free software; you can redistribute it and/or modify it under the Creative Commons Non-Commercial License 3.0. System Requirements: MAOS-QKP is a platform-independent software developed by JAVA version 1.5 or above.
Algorithmic learning of probability distributions from random data in the limit
Barmpalias, George, Stephan, Frank
We study the problem of identifying a probability distribution for some given randomly sampled data in the limit, in the context of algorithmic learning theory as proposed recently by Vinanyi and Chater. We show that there exists a computable partial learner for the computable probability measures, while by Bienvenu, Monin and Shen it is known that there is no computable learner for the computable probability measures. Our main result is the characterization of the oracles that compute explanatory learners for the computable (continuous) probability measures as the high oracles. This provides an analogue of a well-known result of Adleman and Blum in the context of learning computable probability distributions. We also discuss related learning notions such as behaviorally correct learning and orther variations of explanatory learning, in the context of learning probability distributions from data.
Efficient Algorithms for Outlier-Robust Regression
Klivans, Adam, Kothari, Pravesh K., Meka, Raghu
We give the first polynomial-time algorithm for performing linear or polynomial regression resilient to adversarial corruptions in both examples and labels. Given a sufficiently large (polynomial-size) training set drawn i.i.d. from distribution D and subsequently corrupted on some fraction of points, our algorithm outputs a linear function whose squared error is close to the squared error of the best-fitting linear function with respect to D, assuming that the marginal distribution of D over the input space is \emph{certifiably hypercontractive}. This natural property is satisfied by many well-studied distributions such as Gaussian, strongly log-concave distributions and, uniform distribution on the hypercube among others. We also give a simple statistical lower bound showing that some distributional assumption is necessary to succeed in this setting. These results are the first of their kind and were not known to be even information-theoretically possible prior to our work. Our approach is based on the sum-of-squares (SoS) method and is inspired by the recent applications of the method for parameter recovery problems in unsupervised learning. Our algorithm can be seen as a natural convex relaxation of the following conceptually simple non-convex optimization problem: find a linear function and a large subset of the input corrupted sample such that the least squares loss of the function over the subset is minimized over all possible large subsets.
Teacher Improves Learning by Selecting a Training Subset
Ma, Yuzhe, Nowak, Robert, Rigollet, Philippe, Zhang, Xuezhou, Zhu, Xiaojin
We call a learner super-teachable if a teacher can trim down an iid training set while making the learner learn even better. We provide sharp super-teaching guarantees on two learners: the maximum likelihood estimator for the mean of a Gaussian, and the large margin classifier in 1D. For general learners, we provide a mixed-integer nonlinear programming-based algorithm to find a super teaching set. Empirical experiments show that our algorithm is able to find good super-teaching sets for both regression and classification problems.
The Many Faces of Exponential Weights in Online Learning
van der Hoeven, Dirk, van Erven, Tim, Kotłowski, Wojciech
A standard introduction to online learning might place Online Gradient Descent at its center and then proceed to develop generalizations and extensions like Online Mirror Descent and second-order methods. Here we explore the alternative approach of putting exponential weights (EW) first. We show that many standard methods and their regret bounds then follow as a special case by plugging in suitable surrogate losses and playing the EW posterior mean. For instance, we easily recover Online Gradient Descent by using EW with a Gaussian prior on linearized losses, and, more generally, all instances of Online Mirror Descent based on regular Bregman divergences also correspond to EW with a prior that depends on the mirror map. Furthermore, appropriate quadratic surrogate losses naturally give rise to Online Gradient Descent for strongly convex losses and to Online Newton Step. We further interpret several recent adaptive methods (iProd, Squint, and a variation of Coin Betting for experts) as a series of closely related reductions to exp-concave surrogate losses that are then handled by Exponential Weights. Finally, a benefit of our EW interpretation is that it opens up the possibility of sampling from the EW posterior distribution instead of playing the mean. As already observed by Bubeck and Eldan, this recovers the best-known rate in Online Bandit Linear Optimization.
Generalization in Machine Learning via Analytical Learning Theory
Kawaguchi, Kenji, Bengio, Yoshua
This paper introduces a novel measure-theoretic learning theory to analyze generalization behaviors of practical interest. The proposed learning theory has the following abilities: 1) to utilize the qualities of each learned representation on the path from raw inputs to outputs in representation learning, 2) to guarantee good generalization errors possibly with arbitrarily rich hypothesis spaces (e.g., arbitrarily large capacity and Rademacher complexity) and non-stable/non-robust learning algorithms, and 3) to clearly distinguish each individual problem instance from each other. Our generalization bounds are relative to a representation of the data, and hold true even if the representation is learned. We discuss several consequences of our results on deep learning, one-shot learning and curriculum learning. Unlike statistical learning theory, the proposed learning theory analyzes each problem instance individually via measure theory, rather than a set of problem instances via statistics. Because of the differences in the assumptions and the objectives, the proposed learning theory is meant to be complementary to previous learning theory and is not designed to compete with it.