AITopics

We study Nyström type subsampling approaches to large scale kernel methods, and prove learning bounds in the statistical learning setting, where random sampling andhigh probability estimates are considered. In particular, we prove that these approaches can achieve optimal learning bounds, provided the subsampling level is suitably chosen. These results suggest a simple incremental variant of Nyström Kernel Regularized Least Squares, where the subsampling level implements aform of computational regularization, in the sense that it controls at the same time regularization and computations. Extensive experimental analysis showsthat the considered approach achieves state of the art performances on benchmark large scale datasets.

artificial intelligence, leverage score, machine learning, (14 more...)

Country: North America > United States > Massachusetts (0.28)

Genre: Research Report > New Finding (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Hensman, James, Matthews, Alexander G., Filippone, Maurizio, Ghahramani, Zoubin

MCMC for Variationally Sparse Gaussian Processes

Gaussian process (GP) models form a core part of probabilistic machine learning. Considerable research effort has been made into attacking three issues with GP models: how to compute efficiently when the number of data is large; how to approximate theposterior when the likelihood is not Gaussian and how to estimate covariance function parameter posteriors. This paper simultaneously addresses these, using a variational approximation to the posterior which is sparse in support ofthe function but otherwise free-form. The result is a Hybrid Monte-Carlo sampling scheme which allows for a non-Gaussian approximation over the function valuesand covariance parameters simultaneously, with efficient computations based on inducing-point sparse GPs. Code to replicate each experiment in this paper isavailable at github.com/sparseMCMC.

approximation, artificial intelligence, machine learning, (17 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Max-Margin Majority Voting for Learning from Crowds

TIAN, TIAN, Zhu, Jun

Learning-from-crowds aims to design proper aggregation strategies to infer the unknown true labels from the noisy labels provided by ordinary web workers. This paper presents max-margin majority voting (M^3V) to improve the discriminative ability of majority voting and further presents a Bayesian generalization to incorporate the flexibility of generative methods on modeling noisy observations with worker confusion matrices. We formulate the joint learning as a regularized Bayesian inference problem, where the posterior regularization is derived by maximizing the margin between the aggregated score of a potential true label and that of any alternative label. Our Bayesian model naturally covers the Dawid-Skene estimator and M^3V. Empirical results demonstrate that our methods are competitive, often achieving better results than state-of-the-art estimators.

artificial intelligence, machine learning, majority voting, (16 more...)

Country: Asia > China (0.14)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.89)

Ma, Tengyu, Wigderson, Avi

Sum-of-Squares Lower Bounds for Sparse PCA

This paper establishes a statistical versus computational trade-offfor solving a basic high-dimensional machine learning problem via a basic convex relaxation method. Specifically, we consider the {\em Sparse Principal Component Analysis} (Sparse PCA) problem, and the family of {\em Sum-of-Squares} (SoS, aka Lasserre/Parillo) convex relaxations. It was well known that in large dimension $p$, a planted $k$-sparse unit vector can be {\em in principle} detected using only $n \approx k\log p$ (Gaussian or Bernoulli) samples, but all {\em efficient} (polynomial time) algorithms known require $n \approx k^2 $ samples. It was also known that this quadratic gap cannot be improved by the the most basic {\em semi-definite} (SDP, aka spectral) relaxation, equivalent to a degree-2 SoS algorithms. Here we prove that also degree-4 SoS algorithms cannot improve this quadratic gap. This average-case lower bound adds to the small collection of hardness results in machine learning for this powerful family of convex relaxation algorithms. Moreover, our design of moments (or ``pseudo-expectations'') for this lower bound is quite different than previous lower bounds. Establishing lower bounds for higher degree SoS algorithms for remains a challenging problem.

algorithm, artificial intelligence, machine learning, (16 more...)

Country: North America > United States (1.00)

Industry: Education (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.46)

Hazan, Elad, Levy, Kfir, Shalev-Shwartz, Shai

Beyond Convexity: Stochastic Quasi-Convex Optimization

This poster has been moved from Monday #86 to Thursday #101. Stochastic convex optimization is a basic and well studied primitive in machine learning. It is well known that convex and Lipschitz functions can be minimized efficiently using Stochastic Gradient Descent (SGD).The Normalized Gradient Descent (NGD) algorithm, is an adaptation of Gradient Descent, which updates according to the direction of the gradients, rather than the gradients themselves. In this paper we analyze a stochastic version of NGD and prove its convergence to a global minimum for a wider class of functions: we require the functions to be quasi-convex and locally-Lipschitz. Quasi-convexity broadens the concept of unimodality to multidimensions and allows for certain types of saddle points, which are a known hurdle for first-order optimization methods such as gradient descent. Locally-Lipschitz functions are only required to be Lipschitz in a small region around the optimum. This assumption circumvents gradient explosion, which is another known hurdle for gradient descent variants. Interestingly, unlike the vanilla SGD algorithm, the stochastic normalized gradient descent algorithm provably requires a minimal minibatch size.

artificial intelligence, gradient, machine learning, (16 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Adaptive Stochastic Optimization: From Sets to Paths

Lim, Zhan Wei, Hsu, David, Lee, Wee Sun

adaptive stochastic optimization problem, artificial intelligence, machine learning, (14 more...)

Adaptive stochastic optimization optimizes an objective function adaptively under uncertainty. Adaptive stochastic optimization plays a crucial role in planning and learning under uncertainty, but is, unfortunately, computationally intractable in general. This paper introduces two conditions on the objective function, the marginal likelihood rate bound and the marginal likelihood bound, which enable efficient approximate solution of adaptive stochastic optimization. Several interesting classes of functions satisfy these conditions naturally, e.g., the version space reduction function for hypothesis learning. We describe Recursive Adaptive Coverage (RAC), a new adaptive stochastic optimization algorithm that exploits these conditions, and apply it to two planning tasks under uncertainty. In constrast to the earlier submodular optimization approach, our algorithm applies to adaptive stochastic optimization algorithm over both sets and paths.

Country: Asia (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Yi, Xinyang, Caramanis, Constantine

Regularized EM Algorithms: A Unified Framework and Statistical Guarantees

Latent models are a fundamental modeling tool in machine learning applications, but they present significant computational and analytical challenges. The popular EM algorithm and its variants, is a much used algorithmic tool; yet our rigorous understanding of its performance is highly incomplete. Recently, work in [1] has demonstrated that for an important class of problems, EM exhibits linear local convergence. In the high-dimensional setting, however, the M-step may not be well defined. We address precisely this setting through a unified treatment using regularization. While regularization for high-dimensional problems is by now well understood, the iterative EM algorithm requires a careful balancing of making progress towards the solution while identifying the right structure (e.g., sparsity or low-rank). In particular, regularizing the M-step using the state-of-the-art high-dimensional prescriptions (e.g., \`a la [19]) is not guaranteed to provide this balance. Our algorithm and analysis are linked in a way that reveals the balance between optimization and statistical errors. We specialize our general framework to sparse gaussian mixture models, high-dimensional mixed regression, and regression with missing variables, obtaining statistical guarantees for each of these examples.

algorithm, artificial intelligence, machine learning, (16 more...)

Country: North America > United States > Texas (0.14)

Industry: Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Mroueh, Youssef, Voinea, Stephen, Poggio, Tomaso A.

Learning with Group Invariant Features: A Kernel Perspective.

We analyze in this paper a random feature map based on a theory of invariance (I-theory) introduced in [1]. More specifically, a group invariant signal signature is obtained through cumulative distributions of group-transformed random projections. Ouranalysis bridges invariant feature learning with kernel methods, as we show that this feature map defines an expected Haar-integration kernel that is invariant to the specified group action. We show how this nonlinear random feature mapapproximates this group invariant kernel uniformly on a set of N points. Moreover, we show that it defines a function space that is dense in the equivalent Invariant Reproducing Kernel Hilbert Space. Finally, we quantify error rates of the convergence of the empirical risk minimization, as well as the reduction in the sample complexity of a learning algorithm using such an invariant representation for signal classification, in a classical supervised learning setting.

artificial intelligence, kernel, machine learning, (14 more...)

Country: North America > United States (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.35)

Pentina, Anastasia, Lampert, Christoph H.

Lifelong Learning with Non-i.i.d. Tasks

In this work we aim at extending theoretical foundations of lifelong learning. Previous work analyzing this scenario is based on the assumption that the tasks are sampled i.i.d. from a task environment or limited to strongly constrained data distributions. Instead we study two scenarios when lifelong learning is possible, even though the observed tasks do not form an i.i.d. sample: first, when they are sampled from the same environment, but possibly with dependencies, and second, when the task environment is allowed to change over time. In the first case we prove a PAC-Bayesian theorem, which can be seen as a direct generalization of the analogous previous result for the i.i.d. case. For the second scenario we propose to learn an inductive bias in form of a transfer procedure. We present a generalization bound and show on a toy example how it can be used to identify a beneficial transfer algorithm.

artificial intelligence, inductive learning, machine learning, (16 more...)

Country: Europe (0.14)

Genre: Instructional Material (0.83)

Industry: Education > Educational Setting > Continuing Education (0.83)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.46)

Fast Rates for Exp-concave Empirical Risk Minimization

Koren, Tomer, Levy, Kfir

We consider Empirical Risk Minimization (ERM) in the context of stochastic optimization with exp-concave and smooth losses---a general optimization framework that captures several important learning problems including linear and logistic regression, learning SVMs with the squared hinge-loss, portfolio selection and more. In this setting, we establish the first evidence that ERM is able to attain fast generalization rates, and show that the expected loss of the ERM solution in $d$ dimensions converges to the optimal expected loss in a rate of $d/n$. This rate matches existing lower bounds up to constants and improves by a $\log{n}$ factor upon the state-of-the-art, which is only known to be attained by an online-to-batch conversion of computationally expensive online algorithms.

artificial intelligence, convergence rate, machine learning, (16 more...)

Country: Asia > Middle East > Israel (0.14)

Genre: Research Report > New Finding (0.49)

Industry: Education (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)