AITopics

1407.7969

Genre: Research Report (0.40)

Industry: Energy > Power Industry (0.46)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Sun, Yuekai, Ioannidis, Stratis, Montanari, Andrea

Learning Mixtures of Linear Classifiers

arXiv.org Machine LearningJul-30-2014

We consider a discriminative learning (regression) problem, whereby the regression function is a convex combination of k linear classifiers. Existing approaches are based on the EM algorithm, or similar techniques, without provable guarantees. We develop a simple method based on spectral techniques and a `mirroring' trick, that discovers the subspace spanned by the classifiers' parameter vectors. Under a probabilistic assumption on the feature vector distribution, we prove that this approach has nearly optimal statistical efficiency.

artificial intelligence, classifier, machine learning, (17 more...)

1311.2547

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.35)

Severinski, Cody, Salakhutdinov, Ruslan

Bayesian Probabilistic Matrix Factorization: A User Frequency Analysis

arXiv.org Machine LearningJul-29-2014

Matrix factorization (MF) has become a common approach to collaborative filtering, due to ease of implementation and scalability to large data sets. Two existing drawbacks of the basic model is that it does not incorporate side information on either users or items, and assumes a common variance for all users. We extend the work of constrained probabilistic matrix factorization by deriving the Gibbs updates for the side feature vectors for items (Salakhutdinov and Minh, 2008). We show that this Bayesian treatment to the constrained PMF model outperforms simple MAP estimation. We also consider extensions to heteroskedastic precision introduced in the literature (Lakshminarayanan, Bouchard, and Archambeau, 2011). We show that this tends result in overfitting for deterministic approximation algorithms (ex: Variational inference) when the observed entries in the user / item matrix are distributed in an non-uniform manner. In light of this, we propose a truncated precision model. Our experimental results suggest that this model tends to delay overfitting.

artificial intelligence, extension, machine learning, (13 more...)

1407.784

Country: North America > Canada > Ontario > Toronto (0.15)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.49)

arXiv.org Machine LearningJul-29-2014

Sure Screening for Gaussian Graphical Models

Luo, Shikai, Song, Rui, Witten, Daniela

We propose {graphical sure screening}, or GRASS, a very simple and computationally-efficient screening procedure for recovering the structure of a Gaussian graphical model in the high-dimensional setting. The GRASS estimate of the conditional dependence graph is obtained by thresholding the elements of the sample covariance matrix. The proposed approach possesses the sure screening property: with very high probability, the GRASS estimated edge set contains the true edge set. Furthermore, with high probability, the size of the estimated edge set is controlled. We provide a choice of threshold for GRASS that can control the expected false positive rate. We illustrate the performance of GRASS in a simulation study and on a gene expression data set, and show that in practice it performs quite competitively with more complex and computationally-demanding techniques for graph estimation.

artificial intelligence, graphical lasso, machine learning, (15 more...)

1407.7819

Genre: Research Report (0.50)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.85)

Gramacy, Robert B., Ludkovski, Mike

Sequential Design for Optimal Stopping Problems

arXiv.org Machine LearningJul-29-2014

We propose a new approach to solve optimal stopping problems via simulation. Working within the backward dynamic programming/Snell envelope framework, we augment the methodology of Longstaff-Schwartz that focuses on approximating the stopping strategy. Namely, we introduce adaptive generation of the stochastic grids anchoring the simulated sample paths of the underlying state process. This allows for active learning of the classifiers partitioning the state space into the continuation and stopping regions. To this end, we examine sequential design schemes that adaptively place new design points close to the stopping boundaries. We then discuss dynamic regression algorithms that can implement such recursive estimation and local refinement of the classifiers. The new algorithm is illustrated with a variety of numerical experiments, showing that an order of magnitude savings in terms of design size can be achieved. We also compare with existing benchmarks in the context of pricing multi-dimensional Bermudan options.

artificial intelligence, gramacy and ludkovski sequential design, machine learning, (15 more...)

doi: 10.1137/140980089

1309.3832

Genre: Research Report (0.81)

Industry: Banking & Finance (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.88)

Efficient Regularized Regression for Variable Selection with L0 Penalty

Liu, Zhenqiu, Li, Gang

Variable (feature, gene, model, which we use interchangeably) selections for regression with high-dimensional BIGDATA have found many applications in bioinformatics, computational biology, image processing, and engineering. One appealing approach is the L0 regularized regression which penalizes the number of nonzero features in the model directly. L0 is known as the most essential sparsity measure and has nice theoretical properties, while the popular L1 regularization is only a best convex relaxation of L0. Therefore, it is natural to expect that L0 regularized regression performs better than LASSO. However, it is well-known that L0 optimization is NP-hard and computationally challenging. Instead of solving the L0 problems directly, most publications so far have tried to solve an approximation problem that closely resembles L0 regularization. In this paper, we propose an efficient EM algorithm (L0EM) that directly solves the L0 optimization problem. $L_0$EM is efficient with high dimensional data. It also provides a natural solution to all Lp p in [0,2] problems. The regularized parameter can be either determined through cross-validation or AIC and BIC. Theoretical properties of the L0-regularized estimator are given under mild conditions that permit the number of variables to be much larger than the sample size. We demonstrate our methods through simulation and high-dimensional genomic data. The results indicate that L0 has better performance than LASSO and L0 with AIC or BIC has similar performance as computationally intensive cross-validation. The proposed algorithms are efficient in identifying the non-zero variables with less-bias and selecting biologically important genes and pathways with high dimensional BIGDATA.

artificial intelligence, machine learning, regularized regression, (17 more...)

1407.7508

Country: North America > United States > California (0.28)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Salimans, Tim, Knowles, David A.

Fixed-Form Variational Posterior Approximation through Stochastic Linear Regression

In Bayesian analysis the form of the posterior distribution is often not analytically tractable. To obtain quantities of interest under such a distribution, such as moments or marginal distributions, we typically need to use Monte Carlo methods or approximate the posterior with a more convenient distribution. A popular method of obtaining such an approximation is structured or fixed-form Variational Bayes, which works by numerically minimizing the Kullback-Leibler divergence of an approximating distribution in the exponential family to the intractable target distribution (Attias, 2000; Beal and Ghahramani, 2006; Jordan et al., 1999; Wainwright and Jordan, 2008). For certain problems, algorithms exist that can solve this optimization problem in much less time than it would take to approximate the posterior using Monte Carlo methods (see e.g.

approximation, artificial intelligence, machine learning, (18 more...)

doi: 10.1214/13-BA858

1206.6679

Country:

Asia > Middle East > Jordan (0.44)
North America > United States > California (0.28)

Genre: Research Report (0.81)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Dynamic Feature Scaling for Online Learning of Binary Classifiers

Bollegala, Danushka

Scaling feature values is an important step in numerous machine learning tasks. Different features can have different value ranges and some form of a feature scaling is often required in order to learn an accurate classifier. However, feature scaling is conducted as a preprocessing task prior to learning. This is problematic in an online setting because of two reasons. First, it might not be possible to accurately determine the value range of a feature at the initial stages of learning when we have observed only a few number of training instances. Second, the distribution of data can change over the time, which render obsolete any feature scaling that we perform in a pre-processing step. We propose a simple but an effective method to dynamically scale features at train time, thereby quickly adapting to any changes in the data stream. We compare the proposed dynamic feature scaling method against more complex methods for estimating scaling parameters using several benchmark datasets for binary classification. Our proposed feature scaling method consistently outperforms more complex methods on all of the benchmark datasets and improves classification accuracy of a state-of-the-art online binary classifier algorithm.

artificial intelligence, inductive learning, machine learning, (15 more...)

1407.7584

Genre: Research Report > New Finding (0.69)

Industry: Education > Educational Setting > Online (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

Strobl, Eric V., Visweswaran, Shyam

Dependence versus Conditional Dependence in Local Causal Discovery from Gene Expression Data

Motivation: Algorithms that discover variables which are causally related to a target may inform the design of experiments. With observational gene expression data, many methods discover causal variables by measuring each variable's degree of statistical dependence with the target using dependence measures (DMs). However, other methods measure each variable's ability to explain the statistical dependence between the target and the remaining variables in the data using conditional dependence measures (CDMs), since this strategy is guaranteed to find the target's direct causes, direct effects, and direct causes of the direct effects in the infinite sample limit. In this paper, we design a new algorithm in order to systematically compare the relative abilities of DMs and CDMs in discovering causal variables from gene expression data. Results: The proposed algorithm using a CDM is sample efficient, since it consistently outperforms other state-of-the-art local causal discovery algorithms when samples sizes are small. However, the proposed algorithm using a CDM outperforms the proposed algorithm using a DM only when sample sizes are above several hundred. These results suggest that accurate causal discovery from gene expression data using current CDM-based algorithms requires datasets with at least several hundred samples. Availability: The proposed algorithm is freely available at https://github.com/ericstrobl/DvCD.

algorithm, artificial intelligence, machine learning, (15 more...)

1407.7566

Country: North America > United States (1.00)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Hematology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.68)
Health & Medicine > Therapeutic Area > Oncology > Leukemia (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Agarwal, Alekh, Anandkumar, Animashree, Jain, Prateek, Netrapalli, Praneeth

Learning Sparsely Used Overcomplete Dictionaries via Alternating Minimization

We consider the problem of sparse coding, where each sample consists of a sparse linear combination of a set of dictionary atoms, and the task is to learn both the dictionary elements and the mixing coefficients. Alternating minimization is a popular heuristic for sparse coding, where the dictionary and the coefficients are estimated in alternate steps, keeping the other fixed. Typically, the coefficients are estimated via $\ell_1$ minimization, keeping the dictionary fixed, and the dictionary is estimated through least squares, keeping the coefficients fixed. In this paper, we establish local linear convergence for this variant of alternating minimization and establish that the basin of attraction for the global optimum (corresponding to the true dictionary and the coefficients) is $\order{1/s^2}$, where $s$ is the sparsity level in each sample and the dictionary satisfies RIP. Combined with the recent results of approximate dictionary estimation, this yields provable guarantees for exact recovery of both the dictionary elements and the coefficients, when the dictionary elements are incoherent.

artificial intelligence, machine learning, optimization problem, (16 more...)

1310.7991

Country: North America > United States (0.93)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)