AITopics

1508.01819

Country: North America > United States (1.00)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

arXiv.org Machine LearningAug-7-2015

Dimension reduction for model-based clustering

Scrucca, Luca

We introduce a dimension reduction method for visualizing the clustering structure obtained from a finite mixture of Gaussian densities. Information on the dimension reduction subspace is obtained from the variation on group means and, depending on the estimated mixture model, on the variation on group covariances. The proposed method aims at reducing the dimensionality by identifying a set of linear combinations, ordered by importance as quantified by the associated eigenvalues, of the original features which capture most of the cluster structure contained in the data. Observations may then be projected onto such a reduced subspace, thus providing summary plots which help to visualize the clustering structure. These plots can be particularly appealing in the case of high-dimensional data and noisy structure. The new constructed variables capture most of the clustering information available in the data, and they can be further reduced to improve clustering performance. We illustrate the approach on both simulated and real data sets.

artificial intelligence, machine learning, matrix, (17 more...)

doi: 10.1007/s11222-009-9138-7

1508.01713

Genre:

Research Report (0.64)
Workflow (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

arXiv.org Artificial IntelligenceAug-7-2015

Decomposition and Identification of Linear Structural Equation Models

Chen, Bryant

In this paper, we address the problem of identifying linear structural equation models. We first extend the edge set half-trek criterion to cover a broader class of models. We then show that any semi-Markovian linear model can be recursively decomposed into simpler sub-models, resulting in improved identification power. Finally, we show that, unlike the existing methods developed for linear models, the resulting method subsumes the identification algorithm of non-parametric models.

artificial intelligence, coefficient, machine learning, (15 more...)

arXiv.org Artificial Intelligence

1508.01834

Country: North America > United States > California > Los Angeles County > Los Angeles (0.29)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.34)

Rastogi, Pushpendre, Van Durme, Benjamin

Sublinear Partition Estimation

The output scores of a neural network classifier are converted to probabilities via normalizing over the scores of all competing categories. Computing this partition function, $Z$, is then linear in the number of categories, which is problematic as real-world problem sets continue to grow in categorical types, such as in visual object recognition or discriminative language modeling. We propose three approaches for sublinear estimation of the partition function, based on approximate nearest neighbor search and kernel feature maps and compare the performance of the proposed approaches empirically.

artificial intelligence, machine learning, natural language, (17 more...)

1508.01596

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.90)
(2 more...)

Li, Yan, Reyes, Kristofer G., Vazquez-Anderson, Jorge, Wang, Yingfei, Contreras, Lydia M., Powell, Warren B.

A Knowledge Gradient Policy for Sequencing Experiments to Identify the Structure of RNA Molecules Using a Sparse Additive Belief Model

We present a sparse knowledge gradient (SpKG) algorithm for adaptively selecting the targeted regions within a large RNA molecule to identify which regions are most amenable to interactions with other molecules. Experimentally, such regions can be inferred from fluorescence measurements obtained by binding a complementary probe with fluorescence markers to the targeted regions. We use a biophysical model which shows that the fluorescence ratio under the log scale has a sparse linear relationship with the coefficients describing the accessibility of each nucleotide, since not all sites are accessible (due to the folding of the molecule). The SpKG algorithm uniquely combines the Bayesian ranking and selection problem with the frequentist $\ell_1$ regularized regression approach Lasso. We use this algorithm to identify the sparsity pattern of the linear model as well as sequentially decide the best regions to test before experimental budget is exhausted. Besides, we also develop two other new algorithms: batch SpKG algorithm, which generates more suggestions sequentially to run parallel experiments; and batch SpKG with a procedure which we call length mutagenesis. It dynamically adds in new alternatives, in the form of types of probes, are created by inserting, deleting or mutating nucleotides within existing probes. In simulation, we demonstrate these algorithms on the Group I intron (a mid-size RNA molecule), showing that they efficiently learn the correct sparsity pattern, identify the most accessible region, and outperform several other policies.

artificial intelligence, bayesian inference, machine learning, (21 more...)

1508.01551

Country: North America > United States (0.67)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.87)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)

Universal Approximation of Edge Density in Large Graphs

Boullé, Marc

With the recent availability of much network data, such as world wide web, social networks, phone call networks, science collaboration graphs [1], [2], there is a renewed interest for the graph partitioning problem, especially for the automatic discovery of community structures in large networks [3], [4], [5]. Beyond clustering approaches, coclustering approaches aim at summarizing the relation between two entities in a many-to-many relationship. Such a relation can be represented as a graph, where the source and target vertices represent entities and the edges stand for relations between entities. A coclustering model provides a summary of a graph by grouping source vertices and target vertices. For example, in market analysis, the source vertices of the graph represent customers, the target vertices represent products and there is one edge each time a customer has purchased a product. A coclustering model summarizes the dataset by grouping customers that have purchased approximately the same products and grouping products that have been purchased by approximately the same customers. Coclustering models have been applied to many other domains, such as information retrieval (the entities are documents and their words in a text corpus), web log analysis (cookies and their visited web pages), web structure analysis (web pages with hyperlinks between them) or telecommunication network (the call detail records stand for the edges in a call graph between a caller and a called party). All these real-world graphs are directed multigraphs, meaning that two entities may be linked by multi-edges. We aim to summarize and discover insightful patterns in such graphs, using a method with the desired following properties: 1) Robustness, to avoid detecting spurious patterns in case of noisy data.

data mining, machine learning, natural language, (20 more...)

1508.0134

Country:

North America > United States (1.00)
Europe (0.67)

Genre: Research Report > New Finding (0.45)

Industry:

Telecommunications (0.88)
Information Technology (0.68)
Transportation > Air (0.46)
Government > Regional Government (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Web (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
(4 more...)

Furmston, Thomas, Lever, Guy

A Gauss-Newton Method for Markov Decision Processes

Approximate Newton methods are a standard optimization tool which aim to maintain the benefits of Newton's method, such as a fast rate of convergence, whilst alleviating its drawbacks, such as computationally expensive calculation or estimation of the inverse Hessian. In this work we investigate approximate Newton methods for policy optimization in Markov Decision Processes (MDPs). We first analyse the structure of the Hessian of the objective function for MDPs. We show that, like the gradient, the Hessian exhibits useful structure in the context of MDPs and we use this analysis to motivate two Gauss-Newton Methods for MDPs. Like the Gauss-Newton method for non-linear least squares, these methods involve approximating the Hessian by ignoring certain terms in the Hessian which are difficult to estimate. The approximate Hessians possess desirable properties, such as negative definiteness, and we demonstrate several important performance guarantees including guaranteed ascent directions, invariance to affine transformation of the parameter space, and convergence guarantees. We finally provide a unifying perspective of key policy search algorithms, demonstrating that our second Gauss-Newton algorithm is closely related to both the EM-algorithm and natural gradient ascent applied to MDPs, but performs significantly better in practice on a range of challenging domains.

artificial intelligence, gauss-newton method, machine learning, (17 more...)

1507.08271

Country: North America > United States > California (0.27)

Genre: Research Report > New Finding (0.67)

Industry:

Leisure & Entertainment > Games (1.00)
Transportation (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Carvajal, Rodrigo, Agüero, Juan C., Godoy, Boris I., Katselis, Dimitrios

A MAP approach for $\ell_q$-norm regularized sparse parameter estimation using the EM algorithm

arXiv.org Machine LearningAug-5-2015

In this paper, Bayesian parameter estimation through the consideration of the Maximum A Posteriori (MAP) criterion is revisited under the prism of the Expectation-Maximization (EM) algorithm. By incorporating a sparsity-promoting penalty term in the cost function of the estimation problem through the use of an appropriate prior distribution, we show how the EM algorithm can be used to efficiently solve the corresponding optimization problem. To this end, we rely on variance-mean Gaussian mixtures (VMGM) to describe the prior distribution, while we incorporate many nice features of these mixtures to our estimation problem. The corresponding MAP estimation problem is completely expressed in terms of the EM algorithm, which allows for handling nonlinearities and hidden variables that cannot be easily handled with traditional methods. For comparison purposes, we also develop a Coordinate Descent algorithm for the $\ell_q$-norm penalized problem and present the performance results via simulations.

artificial intelligence, bayesian inference, machine learning, (19 more...)

1508.01071

Country:

Europe (0.46)
North America > United States > Illinois (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Farmanesh, Babak, Pourhabib, Arash

Sparse Pseudo-input Local Kriging for Large Non-stationary Spatial Datasets with Exogenous Variables

arXiv.org Machine LearningAug-5-2015

Gaussian process (GP) regression is a powerful tool for building predictive models for spatial systems. However, it does not scale efficiently for large datasets. Particularly, for high-dimensional spatial datasets, i.e., spatial datasets that contain exogenous variables, the performance of GP regression further deteriorates. This paper presents the Sparse Pseudo-input Local Kriging (SPLK) which approximates the full GP for spatial datasets with exogenous variables. SPLK employs orthogonal cuts which decompose the domain into smaller subdomains and then applies a sparse approximation of the full GP in each subdomain. We obtain the continuity of the global predictor by imposing continuity constraints on the boundaries of the neighboring subdomains. The domain decomposition scheme applies independent covariance structures in each region, and as a result, SPLK captures heterogeneous covariance structures. SPLK achieves computational efficiency by utilizing sparse approximation in each subdomain which enables SPLK to accommodate large subdomains that contain many data points and possess a homogenous covariance structure. We Apply the proposed method to real and simulated datasets. We conclude that the combination of orthogonal cuts and sparse approximation makes the proposed method an efficient algorithm for high-dimensional large spatial datasets.

artificial intelligence, machine learning, modeling & simulation, (19 more...)

1508.01248

Country: North America > United States (0.93)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Tangkaratt, Voot, Sasaki, Hiroaki, Sugiyama, Masashi

Direct Estimation of the Derivative of Quadratic Mutual Information with Application in Supervised Dimension Reduction

arXiv.org Machine LearningAug-5-2015

A typical goal of supervised dimension reduction is to find a low-dimensional subspace of the input space such that the projected input variables preserve maximal information about the output variables. The dependence maximization approach solves the supervised dimension reduction problem through maximizing a statistical dependence between projected input variables and output variables. A well-known statistical dependence measure is mutual information (MI) which is based on the Kullback-Leibler (KL) divergence. However, it is known that the KL divergence is sensitive to outliers. On the other hand, quadratic MI (QMI) is a variant of MI based on the $L_2$ distance which is more robust against outliers than the KL divergence, and a computationally efficient method to estimate QMI from data, called least-squares QMI (LSQMI), has been proposed recently. For these reasons, developing a supervised dimension reduction method based on LSQMI seems promising. However, not QMI itself, but the derivative of QMI is needed for subspace search in supervised dimension reduction, and the derivative of an accurate QMI estimator is not necessarily a good estimator of the derivative of QMI. In this paper, we propose to directly estimate the derivative of QMI without estimating QMI itself. We show that the direct estimation of the derivative of QMI is more accurate than the derivative of the estimated QMI. Finally, we develop a supervised dimension reduction algorithm which efficiently uses the proposed derivative estimator, and demonstrate through experiments that the proposed method is more robust against outliers than existing methods.

artificial intelligence, machine learning, qmi, (15 more...)

1508.01019

Country: North America > United States (1.00)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (1.00)