application


Linear Submodular Bandits and their Application to Diversified Retrieval

Neural Information Processing Systems

Diversified retrieval and online learning are two core research areas in the design of modern information retrieval systems.In this paper, we propose the linear submodular bandits problem, which is an online learning setting for optimizing a general class of feature-rich submodular utility models for diversified retrieval. We present an algorithm, called LSBGREEDY, and prove that it efficiently converges to a near-optimal model. As a case study, we applied our approach to the setting of personalized news recommendation, where the system must recommend small sets of news articles selected from tens of thousands of available articles each day. In a live user study, we found that LSBGREEDY significantly outperforms existing online learning approaches. Papers published at the Neural Information Processing Systems Conference.


Adaptive Learning of Smoothing Functions: Application to Electricity Load Forecasting

Neural Information Processing Systems

This paper proposes an efficient online learning algorithm to track the smoothing functions of Additive Models. The key idea is to combine the linear representation of Additive Models with a Recursive Least Squares (RLS) filter. In order to quickly track changes in the model and put more weight on recent data, the RLS filter uses a forgetting factor which exponentially weights down observations by the order of their arrival. The tracking behaviour is further enhanced by using an adaptive forgetting factor which is updated based on the gradient of the a priori errors. Using results from Lyapunov stability theory, upper bounds for the learning rate are analyzed.


Approximate Message Passing with Consistent Parameter Estimation and Applications to Sparse Learning

Neural Information Processing Systems

We consider the estimation of an i.i.d.\ vector $\xbf \in \R n$ from measurements $\ybf \in \R m$ obtained by a general cascade model consisting of a known linear transform followed by a probabilistic componentwise (possibly nonlinear) measurement channel. We present a method, called adaptive generalized approximate message passing (Adaptive GAMP), that enables joint learning of the statistics of the prior and measurement channel along with estimation of the unknown vector $\xbf$. The proposed algorithm is a generalization of a recently-developed method by Vila and Schniter that uses expectation-maximization (EM) iterations where the posteriors in the E-steps are computed via approximate message passing. The techniques can be applied to a large class of learning problems including the learning of sparse priors in compressed sensing or identification of linear-nonlinear cascade models in dynamical systems and neural spiking processes. We prove that for large i.i.d.\ Gaussian transform matrices the asymptotic componentwise behavior of the adaptive GAMP algorithm is predicted by a simple set of scalar state evolution equations.


Recovery of Sparse Probability Measures via Convex Programming

Neural Information Processing Systems

We consider the problem of cardinality penalized optimization of a convex function over the probability simplex with additional convex constraints. It's well-known that the classical L1 regularizer fails to promote sparsity on the probability simplex since L1 norm on the probability simplex is trivially constant. We propose a direct relaxation of the minimum cardinality problem and show that it can be efficiently solved using convex programming. As a first application we consider recovering a sparse probability measure given moment constraints, in which our formulation becomes linear programming, hence can be solved very efficiently. A sufficient condition for exact recovery of the minimum cardinality solution is derived for arbitrary affine constraints.


Learning Bregman Distance Functions and Its Application for Semi-Supervised Clustering

Neural Information Processing Systems

Learning distance functions with side information plays a key role in many machine learning and data mining applications. Conventional approaches often assume a Mahalanobis distance function. These approaches are limited in two aspects: (i) they are computationally expensive (even infeasible) for high dimensional data because the size of the metric is in the square of dimensionality; (ii) they assume a fixed metric for the entire input space and therefore are unable to handle heterogeneous data. In this paper, we propose a novel scheme that learns nonlinear Bregman distance functions from side information using a non-parametric approach that is similar to support vector machines. The proposed scheme avoids the assumption of fixed metric because its local distance metric is implicitly derived from the Hessian matrix of a convex function that is used to generate the Bregman distance function.


Robust Principal Component Analysis: Exact Recovery of Corrupted Low-Rank Matrices via Convex Optimization

Neural Information Processing Systems

Principal component analysis is a fundamental operation in computational data analysis, with myriad applications ranging from web search to bioinformatics to computer vision and image analysis. However, its performance and applicability in real scenarios are limited by a lack of robustness to outlying or corrupted observations. This paper considers the idealized "robust principal component analysis" problem of recovering a low rank matrix A from corrupted observations D A E. Here, the error entries E can be arbitrarily large (modeling grossly corrupted observations common in visual and bioinformatic data), but are assumed to be sparse. We prove that most matrices A can be efficiently and exactly recovered from most error sign-and-support patterns, by solving a simple convex program. Our result holds even when the rank of A grows nearly proportionally (up to a logarithmic factor) to the dimensionality of the observation space and the number of errors E grows in proportion to the total number of entries in the matrix.


Online L1-Dictionary Learning with Application to Novel Document Detection

Neural Information Processing Systems

Given their pervasive use, social media, such as Twitter, have become a leading source of breaking news. A key task in the automated identification of such news is the detection of novel documents from a voluminous stream of text documents in a scalable manner. Motivated by this challenge, we introduce the problem of online L1-dictionary learning where unlike traditional dictionary learning, which uses squared loss, the L1-penalty is used for measuring the reconstruction error. We present an efficient online algorithm for this problem based on alternating directions method of multipliers, and establish a sublinear regret bound for this algorithm. Empirical results on news-stream and Twitter data, shows that this online L1-dictionary learning algorithm for novel document detection gives more than an order of magnitude speedup over the previously known batch algorithm, without any significant loss in quality of results.


Valeo.AI-sponsored PhD student/researcher: Applied research in self-driving cars

#artificialintelligence

We offer PhD position in the university environment while being partially supervised by the top-tier researcher from Valeo.AI research group. Show your research to be meaningful by running your codes on the real self-driving car to improve human safety. In case of interest, please, send your application by email to the principal investigator Karel Zimmermann, zimmerk@fel.cvut.cz. The application should be a single PDF file including applicant CV and a short research statement.


Q-MKL: Matrix-induced Regularization in Multi-Kernel Learning with Applications to Neuroimaging

Neural Information Processing Systems

Multiple Kernel Learning (MKL) generalizes SVMs to the setting where one simultaneously trains a linear classifier and chooses an optimal combination of given base kernels. Model complexity is typically controlled using various norm regularizations on the vector of base kernel mixing coefficients. Existing methods, however, neither regularize nor exploit potentially useful information pertaining to how kernels in the input set'interact'; that is, higher order kernel-pair relationships that can be easily obtained via unsupervised (similarity, geodesics), supervised (correlation in errors), or domain knowledge driven mechanisms (which features were used to construct the kernel?). We show that by substituting the norm penalty with an arbitrary quadratic function Q \succeq 0, one can impose a desired covariance structure on mixing coefficient selection, and use this as an inductive bias when learning the concept. This formulation significantly generalizes the widely used 1- and 2-norm MKL objectives.


Survival of the fastest: balancing the business complexity seesaw

#artificialintelligence

At a time of constant innovation and evolving customer expectations, business agility has never been more important. The enterprises most adaptable to change stand the best chance of thriving. By contrast, those caged by complex and disjointed back office systems are running out of time. In particular, established companies that have been on the market for more than a decade face real challenges – from silos to legacy systems – that put the brakes on their ability to innovate and remain relevant to customers. A recent study by Oracle and The Confederation of British Industry highlights just how unrelenting undergoing this change can be, with half of the largest FTSE100 companies disappearing from the index since 2009.