AITopics | Statistical Learning

Collaborating Authors

Statistical Learning

News Overviews Instructional Materials AI-Alerts Classics

Communication Efficient Distributed Optimization using an Approximate Newton-type Method

Shamir, Ohad, Srebro, Nathan, Zhang, Tong

arXiv.org Machine LearningMay-13-2014

We present a novel Newton-type method for distributed optimization, which is particularly well suited for stochastic optimization and learning problems. For quadratic objectives, the method enjoys a linear rate of convergence which provably \emph{improves} with the data size, requiring an essentially constant number of iterations under reasonable assumptions. We provide theoretical and empirical evidence of the advantages of our method compared to other approaches, such as one-shot parameter averaging and ADMM.

artificial intelligence, iteration, machine learning, (17 more...)

arXiv.org Machine Learning

1312.7853

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Dimensionality reduction for click-through rate prediction: Dense versus sparse representation

Fruergaard, Bjarne Ørum, Hansen, Toke Jansen, Hansen, Lars Kai

arXiv.org Machine LearningMay-13-2014

In online advertising, display ads are increasingly being placed based on real-time auctions where the advertiser who wins gets to serve the ad. This is called real-time bidding (RTB). In RTB, auctions have very tight time constraints on the order of 100ms. Therefore mechanisms for bidding intelligently such as clickthrough rate prediction need to be sufficiently fast. In this work, we propose to use dimensionality reduction of the user-website interaction graph in order to produce simplified features of users and websites that can be used as predictors of clickthrough rate. We demonstrate that the Infinite Relational Model (IRM) as a dimensionality reduction offers comparable predictive performance to conventional dimensionality reduction schemes, while achieving the most economical usage of features and fastest computations at run-time. For applications such as real-time bidding, where fast database I/O and few computations are key to success, we thus recommend using IRM based features as predictors to exploit the recommender effects from bipartite graphs.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Machine Learning

1311.6976

Country: Europe > Denmark (0.28)

Genre: Research Report (0.86)

Industry:

Marketing (1.00)
Information Technology > Services (0.48)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (1.00)

Add feedback

Stochastic Gradient Hamiltonian Monte Carlo

Chen, Tianqi, Fox, Emily B., Guestrin, Carlos

arXiv.org Machine LearningMay-12-2014

Hamiltonian Monte Carlo (HMC) sampling methods provide a mechanism for defining distant proposals with high acceptance probabilities in a Metropolis-Hastings framework, enabling more efficient exploration of the state space than standard random-walk proposals. The popularity of such methods has grown significantly in recent years. However, a limitation of HMC methods is the required gradient computation for simulation of the Hamiltonian dynamical system-such computation is infeasible in problems involving a large sample size or streaming data. Instead, we must rely on a noisy gradient estimate computed from a subset of the data. In this paper, we explore the properties of such a stochastic gradient HMC approach. Surprisingly, the natural implementation of the stochastic approximation can be arbitrarily bad. To address this problem we introduce a variant that uses second-order Langevin dynamics with a friction term that counteracts the effects of the noisy gradient, maintaining the desired target distribution as the invariant distribution. Results on simulated data validate our theory. We also provide an application of our methods to a classification task using neural networks and to online Bayesian matrix factorization.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Machine Learning

1402.4102

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.75)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

Nonparametric Estimation of Renyi Divergence and Friends

Krishnamurthy, Akshay, Kandasamy, Kirthevasan, Poczos, Barnabas, Wasserman, Larry

arXiv.org Machine LearningMay-12-2014

We consider nonparametric estimation of $L_2$, Renyi-$\alpha$ and Tsallis-$\alpha$ divergences between continuous distributions. Our approach is to construct estimators for particular integral functionals of two densities and translate them into divergence estimators. For the integral functionals, our estimators are based on corrections of a preliminary plug-in estimator. We show that these estimators achieve the parametric convergence rate of $n^{-1/2}$ when the densities' smoothness, $s$, are both at least $d/4$ where $d$ is the dimension. We also derive minimax lower bounds for this problem which confirm that $s > d/4$ is necessary to achieve the $n^{-1/2}$ rate of convergence. We validate our theoretical guarantees with a number of simulations.

artificial intelligence, estimator, machine learning, (17 more...)

arXiv.org Machine Learning

1402.2966

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

Learning modular structures from network data and node variables

Azizi, Elham, Galagan, James E., Airoldi, Edoardo M.

arXiv.org Machine LearningMay-11-2014

A standard technique for understanding underlying dependency structures among a set of variables posits a shared conditional probability distribution for the variables measured on individuals within a group. This approach is often referred to as module networks, where individuals are represented by nodes in a network, groups are termed modules, and the focus is on estimating the network structure among modules. However, estimation solely from node-specific variables can lead to spurious dependencies, and unverifiable structural assumptions are often used for regularization. Here, we propose an extended model that leverages direct observations about the network in addition to node-specific variables. By integrating complementary data types, we avoid the need for structural assumptions. We illustrate theoretical and practical significance of the model and develop a reversible-jump MCMC learning procedure for learning modules and model parameters. We demonstrate the method accuracy in predicting modular structures from synthetic data and capability to learn influence structures in twitter data and regulatory modules in the Mycobacterium tuberculosis gene regulatory network.

artificial intelligence, machine learning, module, (21 more...)

arXiv.org Machine Learning

1405.2566

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.87)
Health & Medicine > Therapeutic Area > Immunology (0.69)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)
(2 more...)

Add feedback

Functional Bandits

Tran-Thanh, Long, Yu, Jia Yuan

arXiv.org Machine LearningMay-10-2014

The stochastic multi-armed bandit (MAB) model consists of a slot machine with K arms (or actions), each of which delivers rewards that are independently and randomly drawn from an unknown distribution when pulled. In the optimalarm identification problem, the aim is to find an arm with the highest expected reward value. To do so, we can pull the arms and learn (i.e., estimate) their mean rewards. That is, our goal is to distribute a finite budget of T pulls among the arms, such that at the end of the process, we can identify the optimal arm as accurately as possible. This stochastic optimisation problem models many practical applications, ranging from keyword bidding strategy optimisation in sponsored search[Amin et al., 2012], to identifying the best medicines in medical trials [Robbins, 1952], and efficient transmission channel detection in wireless communication networks [Avner, Mannor, and Shamir, 2012]. Although this MAB optimisation model is a well-studied in the online learning community, the focus is on finding the arm with the highest expected reward value [Maron and Moore, 1993, Mnih, Szepesvári, and Audibert, 2008, Audibert, Bubeck, and Munos, 2010b, Karnin, Koren, and Somekh, 2013].

artificial intelligence, data mining, machine learning, (20 more...)

arXiv.org Machine Learning

1405.2432

Country: Europe > United Kingdom (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Data Science > Data Mining > Big Data (0.35)

Add feedback

A PAC-Bayesian bound for Lifelong Learning

Pentina, Anastasia, Lampert, Christoph H.

arXiv.org Machine LearningMay-10-2014

Transfer learning has received a lot of attention in the machine learning community over the last years, and several effective algorithms have been developed. However, relatively little is known about their theoretical properties, especially in the setting of lifelong learning, where the goal is to transfer information to tasks for which no data have been observed so far. In this work we study lifelong learning from a theoretical perspective. Our main result is a PAC-Bayesian generalization bound that offers a unified view on existing paradigms for transfer learning, such as the transfer of parameters or the transfer of low-dimensional representations. We also use the bound to derive two principled lifelong learning algorithms, and we show that these yield results comparable with existing methods.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

1311.2838

Country: Europe (0.14)

Genre:

Instructional Material (1.00)
Research Report (0.82)

Industry: Education > Educational Setting > Continuing Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.69)

Add feedback

Kaggle LSHTC4 Winning Solution

Puurula, Antti, Read, Jesse, Bifet, Albert

arXiv.org Artificial IntelligenceMay-9-2014

Our winning submission to the 2014 Kaggle competition for Large Scale Hierarchical Text Classification (LSHTC) consists mostly of an ensemble of sparse generative models extending Multinomial Naive Bayes. The base-classifiers consist of hierarchically smoothed models combining document, label, and hierarchy level Multinomials, with feature pre-processing using variants of TF-IDF and BM25. Additional diversification is introduced by different types of folds and random search optimization for different measures. The ensemble algorithm optimizes macroFscore by predicting the documents for each label, instead of the usual prediction of labels per document. Scores for documents are predicted by weighted voting of base-classifier outputs with a variant of Feature-Weighted Linear Stacking. The number of documents per label is chosen using label priors and thresholding of vote scores. This document describes the models and software used to build our solution. Reproducing the results for our solution can be done by running the scripts included in the Kaggle package. A package omitting precomputed result files is also distributed. All code is open source, released under GNU GPL 2.0, and GPL 3.0 for Weka and Meka dependencies.

classification, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

1405.0546

Country:

Oceania > New Zealand (0.14)
Asia > China (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.35)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.35)

Add feedback

A consistent deterministic regression tree for non-parametric prediction of time series

Gaillard, Pierre, Baudin, Paul

arXiv.org Machine LearningMay-8-2014

We study online prediction of bounded stationary ergodic processes. To do so, we consider the setting of prediction of individual sequences and build a deterministic regression tree that performs asymptotically as well as the best L-Lipschitz constant predictors. Then, we show why the obtained regret bound entails the asymptotical optimality with respect to the class of bounded stationary ergodic processes.

artificial intelligence, machine learning, prediction, (18 more...)

arXiv.org Machine Learning

1405.1533

Country: Europe (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.71)

Add feedback

Geodesic Distance Function Learning via Heat Flow on Vector Fields

Lin, Binbin, Yang, Ji, He, Xiaofei, Ye, Jieping

arXiv.org Machine LearningMay-8-2014

Learning a distance function or metric on a given data manifold is of great importance in machine learning and pattern recognition. Many of the previous works first embed the manifold to Euclidean space and then learn the distance function. However, such a scheme might not faithfully preserve the distance function if the original manifold is not Euclidean. Note that the distance function on a manifold can always be well-defined. In this paper, we propose to learn the distance function directly on the manifold without embedding. We first provide a theoretical characterization of the distance function by its gradient field. Based on our theoretical analysis, we propose to first learn the gradient field of the distance function and then learn the distance function itself. Specifically, we set the gradient field of a local distance function as an initial vector field. Then we transport it to the whole manifold via heat flow on vector fields. Finally, the geodesic distance function can be obtained by requiring its gradient field to be close to the normalized vector field. Experimental results on both synthetic and real data demonstrate the effectiveness of our proposed algorithm.

artificial intelligence, distance function, machine learning, (18 more...)

arXiv.org Machine Learning

1405.0133

Country: North America > United States (1.00)

Genre: Research Report (0.82)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback