AITopics

McWilliams, Brian, Balduzzi, David, Buhmann, Joachim M.

Correlated random features for fast semi-supervised learning

This paper presents Correlated Nystrom Views (XNV), a fast semi-supervised algorithm for regression and classification. The algorithm draws on two main ideas. First, it generates two views consisting of computationally inexpensive random features. Second, XNV applies multiview regression using Canonical Correlation Analysis (CCA) on unlabeled data to bias the regression towards useful features. It has been shown that, if the views contains accurate estimators, CCA regression can substantially reduce variance with a minimal increase in bias. Random views are justified by recent theoretical and empirical work showing that regression with random features closely approximates kernel regression, implying that random views can be expected to contain accurate estimators. We show that XNV consistently outperforms a state-of-the-art algorithm for semi-supervised learning: substantially improving predictive performance and reducing the variability of performance on a wide variety of real-world datasets, whilst also reducing runtime by orders of magnitude.

artificial intelligence, inductive learning, machine learning, (16 more...)

1306.5554

Country: Europe > Switzerland (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.92)

Oymak, Samet, Thrampoulidis, Christos, Hassibi, Babak

The Squared-Error of Generalized LASSO: A Precise Analysis

We consider the problem of estimating an unknown signal $x_0$ from noisy linear observations $y = Ax_0 + z\in R^m$. In many practical instances, $x_0$ has a certain structure that can be captured by a structure inducing convex function $f(\cdot)$. For example, $\ell_1$ norm can be used to encourage a sparse solution. To estimate $x_0$ with the aid of $f(\cdot)$, we consider the well-known LASSO method and provide sharp characterization of its performance. We assume the entries of the measurement matrix $A$ and the noise vector $z$ have zero-mean normal distributions with variances $1$ and $\sigma^2$ respectively. For the LASSO estimator $x^*$, we attempt to calculate the Normalized Square Error (NSE) defined as $\frac{\|x^*-x_0\|_2^2}{\sigma^2}$ as a function of the noise level $\sigma$, the number of observations $m$ and the structure of the signal. We show that, the structure of the signal $x_0$ and choice of the function $f(\cdot)$ enter the error formulae through the summary parameters $D(cone)$ and $D(\lambda)$, which are defined as the Gaussian squared-distances to the subdifferential cone and to the $\lambda$-scaled subdifferential, respectively. The first LASSO estimator assumes a-priori knowledge of $f(x_0)$ and is given by $\arg\min_{x}\{{\|y-Ax\|_2}~\text{subject to}~f(x)\leq f(x_0)\}$. We prove that its worst case NSE is achieved when $\sigma\rightarrow 0$ and concentrates around $\frac{D(cone)}{m-D(cone)}$. Secondly, we consider $\arg\min_{x}\{\|y-Ax\|_2+\lambda f(x)\}$, for some $\lambda\geq 0$. This time the NSE formula depends on the choice of $\lambda$ and is given by $\frac{D(\lambda)}{m-D(\lambda)}$. We then establish a mapping between this and the third estimator $\arg\min_{x}\{\frac{1}{2}\|y-Ax\|_2^2+ \lambda f(x)\}$. Finally, for a number of important structured signal classes, we translate our abstract formulae to closed-form upper bounds on the NSE.

artificial intelligence, exp, machine learning, (18 more...)

1311.083

Country: North America > United States (0.27)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.92)
Information Technology > Artificial Intelligence > Machine Learning (0.92)

Audiffren, Julien, Kadri, Hachem

Online Learning with Multiple Operator-valued Kernels

We consider the problem of learning a vector-valued function f in an online learning setting. The function f is assumed to lie in a reproducing Hilbert space of operator-valued kernels. We describe two online algorithms for learning f while taking into account the output structure. A first contribution is an algorithm, ONORMA, that extends the standard kernel-based online learning algorithm NORMA from scalar-valued to operator-valued setting. We report a cumulative error bound that holds both for classification and regression. We then define a second algorithm, MONORMA, which addresses the limitation of pre-defining the output structure in ONORMA by learning sequentially a linear combination of operator-valued kernels. Our experiments show that the proposed algorithms achieve good performance results with low computational cost.

algorithm, artificial intelligence, machine learning, (14 more...)

1311.0222

Genre: Research Report (0.82)

Industry: Education > Educational Setting > Online (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.85)

Monnig, Nathan D., Fornberg, Bengt, Meyer, Francois G.

Inverting Nonlinear Dimensionality Reduction with Scale-Free Radial Basis Function Interpolation

Nonlinear dimensionality reduction embeddings computed from datasets do not provide a mechanism to compute the inverse map. In this paper, we address the problem of computing a stable inverse map to such a general bi-Lipschitz map. Our approach relies on radial basis functions (RBFs) to interpolate the inverse map everywhere on the low-dimensional image of the forward map. We demonstrate that the scale-free cubic RBF kernel performs better than the Gaussian kernel: it does not suffer from ill-conditioning, and does not require the choice of a scale. The proposed construction is shown to be similar to the Nystr\"om extension of the eigenvectors of the symmetric normalized graph Laplacian matrix. Based on this observation, we provide a new interpretation of the Nystr\"om extension with suggestions for improvement.

artificial intelligence, extension, machine learning, (16 more...)

1305.0258

Country: North America > United States > Colorado (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.61)

Georgiev, Stoyan, Mukherjee, Sayan

Randomized Dimension Reduction on Massive Data

Scalability of statistical estimators is of increasing importance in modern applications and dimension reduction is often used to extract relevant information from data. A variety of popular dimension reduction approaches can be framed as symmetric generalized eigendecomposition problems. In this paper we outline how taking into account the low rank structure assumption implicit in these dimension reduction approaches provides both computational and statistical advantages. We adapt recent randomized low-rank approximation algorithms to provide efficient solutions to three dimension reduction methods: Principal Component Analysis (PCA), Sliced Inverse Regression (SIR), and Localized Sliced Inverse Regression (LSIR). A key observation in this paper is that randomization serves a dual role, improving both computational and statistical performance. This point is highlighted in our experiments on real and simulated data.

algorithm, artificial intelligence, machine learning, (15 more...)

1211.1642

Country: North America > United States > California > Santa Clara County (0.28)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (1.00)

Amini, Arash A., Chen, Aiyou, Bickel, Peter J., Levina, Elizaveta

Pseudo-likelihood methods for community detection in large sparse networks

Many algorithms have been proposed for fitting network models with communities, but most of them do not scale well to large networks, and often fail on sparse networks. Here we propose a new fast pseudo-likelihood method for fitting the stochastic block model for networks, as well as a variant that allows for an arbitrary degree distribution by conditioning on degrees. We show that the algorithms perform well under a range of settings, including on very sparse networks, and illustrate on the example of a network of political blogs. We also propose spectral clustering with perturbations, a method of independent interest, which works well on sparse networks where regular spectral clustering fails, and use it to provide an initial value for pseudo-likelihood. We prove that pseudo-likelihood provides consistent estimates of the communities under a mild condition on the starting value, for the case of a block model with two communities.

block model, data mining, machine learning, (20 more...)

doi: 10.1214/13-AOS1138

1207.234

Country:

North America > United States > California (0.46)
North America > United States > Michigan (0.28)

Genre: Research Report (0.82)

Industry: Government (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)
(2 more...)

Titsias, Michalis K., Yau, Christopher, Holmes, Christopher C.

Statistical Inference in Hidden Markov Models using $k$-segment Constraints

Fundamentally, the HMM is a mixture model whose mixing distribution is a finite state Markov chain (Rabiner, 1989; Capp e et al., 2005). Whilst the Markov assumptions rarely correspond to the true physical generative process, it often adequately captures first-order properties that make it a useful approximating model for sequence data in many instances whilst remaining tractable even for very large datasets. As a consequence, HMM-based algorithms can give highly competitive performance in many applications. Central to the tractability of HMMs is the availability of recursive algorithms that allow fundamental quantities to be computed efficiently (Baum and Petrie, 1966; Viterbi, 1967). These include the Viterbi algorithm which computes the most probable hidden state sequence and the forward-backward algorithm which computes the marginal probability of a given state at a point in the sequence. Computation for the HMM has been well-summarized in the comprehensive and widely read tutorial by Rabiner (1989) with a Bayesian treatment given more recently by Scott (2002). It is a testament to the completeness of these recursive methods that there have been few generic additions to the HMM toolbox since these were first described in the 1960s. However, as HMM approaches continue to be applied in increasingly diverse scientific domains and ever larger data sets, there is interest in expanding the generic toolbox available for HMM inference to encompass unmet needs. The motivation for our work is to develop mechanisms to allow theexploration of the posterior sequence space.

algorithm, artificial intelligence, machine learning, (18 more...)

doi: 10.1080/01621459.2014.998762

1311.1189

Country:

North America > United States (0.46)
Europe > United Kingdom (0.28)

Genre: Research Report (0.63)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Jain, Ashesh, Wojcik, Brian, Joachims, Thorsten, Saxena, Ashutosh

Learning Trajectory Preferences for Manipulators via Iterative Improvement

arXiv.org Artificial IntelligenceNov-5-2013

We consider the problem of learning good trajectories for manipulation tasks. This is challenging because the criterion defining a good trajectory varies with users, tasks and environments. In this paper, we propose a co-active online learning framework for teaching robots the preferences of its users for object manipulation tasks. The key novelty of our approach lies in the type of feedback expected from the user: the human user does not need to demonstrate optimal trajectories as training data, but merely needs to iteratively provide trajectories that slightly improve over the trajectory currently proposed by the system. We argue that this co-active preference feedback can be more easily elicited from the user than demonstrations of optimal trajectories, which are often challenging and non-intuitive to provide on high degrees of freedom manipulators. Nevertheless, theoretical regret bounds of our algorithm match the asymptotic rates of optimal trajectory algorithms. We demonstrate the generalizability of our algorithm on a variety of grocery checkout tasks, for whom, the preferences were not only influenced by the object being manipulated but also by the surrounding environment.\footnote{For more details and a demonstration video, visit: \url{http://pr.cs.cornell.edu/coactive}}

artificial intelligence, machine learning, trajectory, (17 more...)

arXiv.org Artificial Intelligence

1306.6294

Genre: Research Report > New Finding (0.68)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Cesa-Bianchi, Nicolò, Gentile, Claudio, Zappella, Giovanni

A Gang of Bandits

arXiv.org Machine LearningNov-4-2013

Multi-armed bandit problems are receiving a great deal of attention because they adequately formalize the exploration-exploitation trade-offs arising in several industrially relevant applications, such as online advertisement and, more generally, recommendation systems. In many cases, however, these applications have a strong social component, whose integration in the bandit algorithm could lead to a dramatic performance increase. For instance, we may want to serve content to a group of users by taking advantage of an underlying network of social relationships among them. In this paper, we introduce novel algorithmic approaches to the solution of such networked bandit problems. More specifically, we design and analyze a global strategy which allocates a bandit algorithm to each network node (user) and allows it to "share" signals (contexts and payoffs) with the neghboring nodes. We then derive two more scalable variants of this strategy based on different ways of clustering the graph nodes. We experimentally compare the algorithm and its variants to state-of-the-art methods for contextual bandits that do not use the relational information. Our experiments, carried out on synthetic and real-world datasets, show a marked increase in prediction performance obtained by exploiting the network structure.

artificial intelligence, big data, data mining, (19 more...)

1306.0811

Country: Europe > Italy (0.14)

Genre: Research Report (1.00)

Industry:

Media > Music (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)