AITopics | Statistical Learning

Collaborating Authors

Statistical Learning

News Overviews Instructional Materials AI-Alerts Classics

Approximate Counting of Graphical Models Via MCMC Revisited

arXiv.org Artificial IntelligenceJul-2-2013

In Pe\~na (2007), MCMC sampling is applied to approximately calculate the ratio of essential graphs (EGs) to directed acyclic graphs (DAGs) for up to 20 nodes. In the present paper, we extend that work from 20 to 31 nodes. We also extend that work by computing the approximate ratio of connected EGs to connected DAGs, of connected EGs to EGs, and of connected DAGs to DAGs. Furthermore, we prove that the latter ratio is asymptotically 1. We also discuss the implications of these results for learning DAGs from data.

artificial intelligence, dag, machine learning, (18 more...)

arXiv.org Artificial Intelligence

1301.7189

Country: Europe > Sweden (0.14)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.73)

Add feedback

Algorithms of the LDA model [REPORT]

Špeh, Jaka, Muhič, Andrej, Rupnik, Jan

arXiv.org Machine LearningJul-1-2013

ABSTRACT We review three algorithms for Latent Dirichlet Allocation (LDA). Two of them are variational inference algorithms: V ariational Bayesian inference and Online V ariational Bayesian inference and one is Markov Chain Monte Carlo (MCMC) algorithm - Collapsed Gibbs sampling. We compare their time complexity and performance. We find that online variational Bayesian inference is the fastest algorithm and still returns reasonably good results. 1 INTRODUCTION Nowadays big corpora are used daily. People often search through huge numbers of documents either in libraries or online, using web search engines.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Machine Learning

1307.0317

Country:

Europe > Slovenia (0.14)
Europe > Germany (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.92)

Add feedback

Dimensionality Detection and Integration of Multiple Data Sources via the GP-LVM

Barrett, James, Coolen, Anthony C. C.

arXiv.org Machine LearningJul-1-2013

The Gaussian Process Latent Variable Model (GP-LVM) is a non-linear probabilistic method of embedding a high dimensional dataset in terms low dimensional `latent' variables. In this paper we illustrate that maximum a posteriori (MAP) estimation of the latent variables and hyperparameters can be used for model selection and hence we can determine the optimal number or latent variables and the most appropriate model. This is an alternative to the variational approaches developed recently and may be useful when we want to use a non-Gaussian prior or kernel functions that don't have automatic relevance determination (ARD) parameters. Using a second order expansion of the latent variable posterior we can marginalise the latent variables and obtain an estimate for the hyperparameter posterior. Secondly, we use the GP-LVM to integrate multiple data sources by simultaneously embedding them in terms of common latent variables. We present results from synthetic data to illustrate the successful detection and retrieval of low dimensional structure from high dimensional data. We demonstrate that the integration of multiple data sources leads to more robust performance. Finally, we show that when the data are used for binary classification tasks we can attain a significant gain in prediction accuracy when the low dimensional representation is used.

artificial intelligence, latent variable, machine learning, (14 more...)

arXiv.org Machine Learning

1307.0323

Country: North America > United States (0.46)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Semi-supervised clustering methods

Bair, Eric

arXiv.org Machine LearningJun-30-2013

Cluster analysis methods seek to partition a data set into homogeneous subgroups. It is useful in a wide variety of applications, including document processing and modern genetics. Conventional clustering methods are unsupervised, meaning that there is no outcome variable nor is anything known about the relationship between the observations in the data set. In many situations, however, information about the clusters is available in addition to the values of the features. For example, the cluster labels of some observations may be known, or certain observations may be known to belong to the same cluster. In other cases, one may wish to identify clusters that are associated with a particular outcome variable. This review describes several clustering algorithms (known as "semi-supervised clustering" methods) that can be applied in these situations. The majority of these methods are modifications of the popular k-means clustering method, and several of them will be described in detail. A brief description of some other semi-supervised clustering algorithms is also provided.

artificial intelligence, constraint, machine learning, (17 more...)

arXiv.org Machine Learning

doi: 10.1002/wics.1270

1307.0252

Country: North America > United States (0.67)

Genre:

Workflow (0.68)
Research Report (0.64)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.94)
Health & Medicine > Therapeutic Area > Oncology (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Sparse Principal Component Analysis for High Dimensional Vector Autoregressive Models

Wang, Zhaoran, Han, Fang, Liu, Han

arXiv.org Machine LearningJun-29-2013

We study sparse principal component analysis for high dimensional vector autoregressive time series under a doubly asymptotic framework, which allows the dimension $d$ to scale with the series length $T$. We treat the transition matrix of time series as a nuisance parameter and directly apply sparse principal component analysis on multivariate time series as if the data are independent. We provide explicit non-asymptotic rates of convergence for leading eigenvector estimation and extend this result to principal subspace estimation. Our analysis illustrates that the spectral norm of the transition matrix plays an essential role in determining the final rates. We also characterize sufficient conditions under which sparse principal component analysis attains the optimal parametric rate. Our theoretical results are backed up by thorough numerical studies.

artificial intelligence, machine learning, principal component analysis, (17 more...)

arXiv.org Machine Learning

1307.0164

Country: North America > United States (0.46)

Genre: Research Report (0.50)

Industry:

Health & Medicine (0.47)
Banking & Finance (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (1.00)

Add feedback

Memory Limited, Streaming PCA

Mitliagkas, Ioannis, Caramanis, Constantine, Jain, Prateek

arXiv.org Machine LearningJun-28-2013

We consider streaming, one-pass principal component analysis (PCA), in the high-dimensional regime, with limited memory. Here, $p$-dimensional samples are presented sequentially, and the goal is to produce the $k$-dimensional subspace that best approximates these points. Standard algorithms require $O(p^2)$ memory; meanwhile no algorithm can do better than $O(kp)$ memory, since this is what the output itself requires. Memory (or storage) complexity is most meaningful when understood in the context of computational and sample complexity. Sample complexity for high-dimensional PCA is typically studied in the setting of the {\em spiked covariance model}, where $p$-dimensional points are generated from a population covariance equal to the identity (white noise) plus a low-dimensional perturbation (the spike) which is the signal to be recovered. It is now well-understood that the spike can be recovered when the number of samples, $n$, scales proportionally with the dimension, $p$. Yet, all algorithms that provably achieve this, have memory complexity $O(p^2)$. Meanwhile, algorithms with memory-complexity $O(kp)$ do not have provable bounds on sample complexity comparable to $p$. We present an algorithm that achieves both: it uses $O(kp)$ memory (meaning storage of any kind) and is able to compute the $k$-dimensional spike with $O(p \log p)$ sample-complexity -- the first algorithm of its kind. While our theoretical analysis focuses on the spiked covariance model, our simulations show that our algorithm is successful on much more general models for the data.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

1307.0032

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)

Add feedback

Optimal Feature Selection in High-Dimensional Discriminant Analysis

Kolar, Mladen, Liu, Han

arXiv.org Machine LearningJun-27-2013

We consider the high-dimensional discriminant analysis problem. For this problem, different methods have been proposed and justified by establishing exact convergence rates for the classification risk, as well as the l2 convergence results to the discriminative rule. However, sharp theoretical analysis for the variable selection performance of these procedures have not been established, even though model interpretation is of fundamental importance in scientific data analysis. This paper bridges the gap by providing sharp sufficient conditions for consistent variable selection using the sparse discriminant analysis (Mai et al., 2012). Through careful analysis, we establish rates of convergence that are significantly faster than the best known results and admit an optimal scaling of the sample size n, dimensionality p, and sparsity level s in the high-dimensional setting. Sufficient conditions are complemented by the necessary information theoretic limits on the variable selection problem in the context of high-dimensional discriminant analysis. Exploiting a numerical equivalence result, our method also establish the optimal results for the ROAD estimator (Fan et al., 2012) and the sparse optimal scaling estimator (Clemmensen et al., 2011). Furthermore, we analyze an exhaustive search procedure, whose performance serves as a benchmark, and show that it is variable selection consistent under weaker conditions. Extensive simulations demonstrating the sharpness of the bounds are also provided.

artificial intelligence, machine learning, tt sign, (18 more...)

arXiv.org Machine Learning

1306.6557

Country: North America > United States (0.67)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Add feedback

A Data Mining Approach to Solve the Goal Scoring Problem

Oliveira, Renato, Adeodato, Paulo, Carvalho, Arthur, Viegas, Icamaan, Diego, Christian, Ing-Ren, Tsang

arXiv.org Artificial IntelligenceJun-26-2013

In soccer, scoring goals is a fundamental objective which depends on many conditions and constraints. Considering the RoboCup soccer 2D-simulator, this paper presents a data mining-based decision system to identify the best time and direction to kick the ball towards the goal to maximize the overall chances of scoring during a simulated soccer match. Following the CRISP-DM methodology, data for modeling were extracted from matches of major international tournaments (10691 kicks), knowledge about soccer was embedded via transformation of variables and a Multilayer Perceptron was used to estimate the scoring chance. Experimental performance assessment to compare this approach against previous LDA-based approach was conducted from 100 matches. Several statistical metrics were used to analyze the performance of the system and the results showed an increase of 7.7% in the number of kicks, producing an overall increase of 78% in the number of goals scored.

artificial intelligence, machine learning, opponent, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/IJCNN.2009.5178616

1305.4955

Country: Europe (0.28)

Genre: Research Report > New Finding (0.54)

Industry: Leisure & Entertainment > Sports > Soccer (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Robots > Soccer Robots (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.89)
(2 more...)

Add feedback

Metaheuristics in Flood Disaster Management and Risk Assessment

Bongolan, Vena Pearl, Ballesteros,, Florencio C. Jr., Banting, Joyce Anne M., Olaes, Aina Marie Q., Aquino, Charlymagne R.

arXiv.org Artificial IntelligenceJun-26-2013

A risk assessment method is then used to identify the flood risk in each community using the following risk factors: the area's urbanized area ratio, literacy rate, mortality rate, poverty incidence, radio/TV penetration, and state of structural and nonstructural measures. Vulnerability is defined as a weighted-sum of these components. A „penalty‟ was imposed for reduced vulnerability. Optimization comparison was done with MatLab‟s Genetic Algorithms and Simulated Annealing; Results showed „extreme‟ solutions and realistic designs, for simulated annealing and genetic algorithm, respectively. INTRODUCTION Disaster Risk Management (DRM) at the local, regional, and global scale continues to generate great research interest of a complex, multidisciplinary nature, involving the interplay of scientific, social, economic, and political dimensions. Driven by the series of disasters of increasing frequency and magnitude, DRM meaning and context has evolved into an internationally accepted definition: a systemic approach to identifying, assessing and reducing risk of all kinds associated with hazards and human activities with identified operational and practical disaster risk reduction initiatives. These initiatives have been clarified by the international community through UN‟s 2005 World Conference on Disaster Reduction in Kobe, Japan and accepted as the DRR framework, known as the Hyogo Framework of Action [1]. The ultimate objective of all DRM initiatives remains simple: reduce the loss of lives and property, and improve the capacity of communities to cope with disasters. The 2005 Hyogo Framework of Action (HFA) has been used to review UN member states‟ respective DRM initiatives.

evolutionary algorithm, machine learning, vulnerability, (17 more...)

arXiv.org Artificial Intelligence

1306.6375

Country:

North America > United States (0.94)
Asia > Japan > Honshū > Kansai > Hyogo Prefecture > Kobe (0.25)
Asia > Philippines > Luzon > National Capital Region > City of Manila (0.14)

Genre: Research Report > New Finding (0.55)

Industry:

Government (1.00)
Information Technology > Security & Privacy (0.91)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.56)

Add feedback

Recovering Block-structured Activations Using Compressive Measurements

Balakrishnan, Sivaraman, Kolar, Mladen, Rinaldo, Alessandro, Singh, Aarti

arXiv.org Machine LearningJun-24-2013

We consider the problems of detection and localization of a contiguous block of weak activation in a large matrix, from a small number of noisy, possibly adaptive, compressive (linear) measurements. This is closely related to the problem of compressed sensing, where the task is to estimate a sparse vector using a small number of linear measurements. Contrary to results in compressed sensing, where it has been shown that neither adaptivity nor contiguous structure help much, we show that for reliable localization the magnitude of the weakest signals is strongly influenced by both structure and the ability to choose measurements adaptively while for detection neither adaptivity nor structure reduce the requirement on the magnitude of the signal. We characterize the precise tradeoffs between the various problem parameters, the signal strength and the number of measurements required to reliably detect and localize the block of activation. The sufficient conditions are complemented with information theoretic lower bounds.

artificial intelligence, localization, machine learning, (19 more...)

arXiv.org Machine Learning

1209.3431

Genre: Research Report (0.82)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback