AITopics

1506.06707

Country: North America > United States > New York (0.14)

Genre: Research Report > New Finding (0.93)

Industry: Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Li, Shaohua, Fujimaki, Ryohei, Miao, Chunyan

Factorized Asymptotic Bayesian Inference for Factorial Hidden Markov Models

Factorial hidden Markov models (FHMMs) are powerful tools of modeling sequential data. Learning FHMMs yields a challenging simultaneous model selection issue, i.e., selecting the number of multiple Markov chains and the dimensionality of each chain. Our main contribution is to address this model selection issue by extending Factorized Asymptotic Bayesian (FAB) inference to FHMMs. First, we offer a better approximation of marginal log-likelihood than the previous FAB inference. Our key idea is to integrate out transition probabilities, yet still apply the Laplace approximation to emission probabilities. Second, we prove that if there are two very similar hidden states in an FHMM, i.e. one is redundant, then FAB will almost surely shrink and eliminate one of them, making the model parsimonious. Experimental results show that FAB for FHMMs significantly outperforms state-of-the-art nonparametric Bayesian iFHMM and Variational FHMM in model selection accuracy, with competitive held-out perplexity.

artificial intelligence, machine learning, probability, (16 more...)

1506.07959

Country: Asia (0.14)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Humor in Collective Discourse: Unsupervised Funniness Detection in the New Yorker Cartoon Caption Contest

Radev, Dragomir, Stent, Amanda, Tetreault, Joel, Pappu, Aasish, Iliakopoulou, Aikaterini, Chanfreau, Agustin, de Juan, Paloma, Vallmitjana, Jordi, Jaimes, Alejandro, Jha, Rahul, Mankoff, Bob

The New Yorker publishes a weekly captionless cartoon. More than 5,000 readers submit captions for it. The editors select three of them and ask the readers to pick the funniest one. We describe an experiment that compares a dozen automatic methods for selecting the funniest caption. We show that negative sentiment, human-centeredness, and lexical centrality most strongly match the funniest captions, followed by positive sentiment. These results are useful for understanding humor and also in the design of more engaging conversational agents in text and multimodal (vision+text) systems. As part of this work, a large set of cartoons and captions is being made available to the community.

artificial intelligence, caption, natural language, (16 more...)

1506.08126

Country: North America > United States > New York (0.63)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Modelling of directional data using Kent distributions

Kasarapu, Parthan

The modelling of data on a spherical surface requires the consideration of directional probability distributions. To model asymmetrically distributed data on a three-dimensional sphere, Kent distributions are often used. The moment estimates of the parameters are typically used in modelling tasks involving Kent distributions. However, these lack a rigorous statistical treatment. The focus of the paper is to introduce a Bayesian estimation of the parameters of the Kent distribution which has not been carried out in the literature, partly because of its complex mathematical form. We employ the Bayesian information-theoretic paradigm of Minimum Message Length (MML) to bridge this gap and derive reliable estimators. The inferred parameters are subsequently used in mixture modelling of Kent distributions. The problem of inferring the suitable number of mixture components is also addressed using the MML criterion. We demonstrate the superior performance of the derived MML-based parameter estimates against the traditional estimators. We apply the MML principle to infer mixtures of Kent distributions to model empirical data corresponding to protein conformations. We demonstrate the effectiveness of Kent models to act as improved descriptors of protein structural data as compared to commonly used von Mises-Fisher distributions.

bayesian inference, fb 5, survey article, (20 more...)

1506.08105

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > New Jersey > Hudson County (0.14)

Genre: Research Report > Experimental Study (0.46)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Nakagawa, Kazuya, Suzumura, Shinya, Karasuyama, Masayuki, Tsuda, Koji, Takeuchi, Ichiro

Safe Feature Pruning for Sparse High-Order Interaction Models

Taking into account high-order interactions among covariates is valuable in many practical regression problems. This is, however, computationally challenging task because the number of high-order interaction features to be considered would be extremely large unless the number of covariates is sufficiently small. In this paper, we propose a novel efficient algorithm for LASSO-based sparse learning of such high-order interaction models. Our basic strategy for reducing the number of features is to employ the idea of recently proposed safe feature screening (SFS) rule. An SFS rule has a property that, if a feature satisfies the rule, then the feature is guaranteed to be non-active in the LASSO solution, meaning that it can be safely screened-out prior to the LASSO training process. If a large number of features can be screened-out before training the LASSO, the computational cost and the memory requirment can be dramatically reduced. However, applying such an SFS rule to each of the extremely large number of high-order interaction features would be computationally infeasible. Our key idea for solving this computational issue is to exploit the underlying tree structure among high-order interaction features. Specifically, we introduce a pruning condition called safe feature pruning (SFP) rule which has a property that, if the rule is satisfied in a certain node of the tree, then all the high-order interaction features corresponding to its descendant nodes can be guaranteed to be non-active at the optimal solution. Our algorithm is extremely efficient, making it possible to work, e.g., with 3rd order interactions of 10,000 original covariates, where the number of possible high-order interaction features is greater than 10^{12}.

artificial intelligence, optimization problem, order interaction feature, (16 more...)

1506.08002

Country: Asia > Japan > Honshū (0.15)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.36)

Shah, Amar, Knowles, David A., Ghahramani, Zoubin

An Empirical Study of Stochastic Variational Algorithms for the Beta Bernoulli Process

Stochastic variational inference (SVI) is emerging as the most promising candidate for scaling inference in Bayesian probabilistic models to large datasets. However, the performance of these methods has been assessed primarily in the context of Bayesian topic models, particularly latent Dirichlet allocation (LDA). Deriving several new algorithms, and using synthetic, image and genomic datasets, we investigate whether the understanding gleaned from LDA applies in the setting of sparse latent factor models, specifically beta process factor analysis (BPFA). We demonstrate that the big picture is consistent: using Gibbs sampling within SVI to maintain certain posterior dependencies is extremely effective. However, we find that different posterior dependencies are important in BPFA relative to LDA. Particularly, approximations able to model intra-local variable dependence perform best.

approximation, bayesian inference, health & medicine, (15 more...)

1506.0818

Country:

Europe (0.28)
North America > United States > California > Santa Clara County (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.89)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.88)

Finding Linear Structure in Large Datasets with Scalable Canonical Correlation Analysis

Ma, Zhuang, Lu, Yichao, Foster, Dean

Canonical Correlation Analysis (CCA) is a widely used spectral technique for finding correlation structures in multi-view datasets. In this paper, we tackle the problem of large scale CCA, where classical algorithms, usually requiring computing the product of two huge matrices and huge matrix decomposition, are computationally and storage expensive. We recast CCA from a novel perspective and propose a scalable and memory efficient Augmented Approximate Gradient (AppGrad) scheme for finding top $k$ dimensional canonical subspace which only involves large matrix multiplying a thin matrix of width $k$ and small matrix decomposition of dimension $k\times k$. Further, AppGrad achieves optimal storage complexity $O(k(p_1+p_2))$, compared with classical algorithms which usually require $O(p_1^2+p_2^2)$ space to store two dense whitening matrices. The proposed scheme naturally generalizes to stochastic optimization regime, especially efficient for huge datasets where batch algorithms are prohibitive. The online property of stochastic AppGrad is also well suited to the streaming scenario, where data comes sequentially. To the best of our knowledge, it is the first stochastic algorithm for CCA. Experiments on four real data sets are provided to show the effectiveness of the proposed methods.

algorithm, artificial intelligence, machine learning, (14 more...)

1506.0817

Country:

Europe (0.68)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)

Sarkar, Purnamrita, Bickel, Peter J.

Role of normalization in spectral clustering for stochastic blockmodels

arXiv.org Machine LearningJun-25-2015

Spectral clustering is a technique that clusters elements using the top few eigenvectors of their (possibly normalized) similarity matrix. The quality of spectral clustering is closely tied to the convergence properties of these principal eigenvectors. This rate of convergence has been shown to be identical for both the normalized and unnormalized variants in recent random matrix theory literature. However, normalization for spectral clustering is commonly believed to be beneficial [Stat. Comput. 17 (2007) 395-416]. Indeed, our experiments show that normalization improves prediction accuracy. In this paper, for the popular stochastic blockmodel, we theoretically show that normalization shrinks the spread of points in a class by a constant fraction under a broad parameter regime. As a byproduct of our work, we also obtain sharp deviation bounds of empirical principal eigenvalues of graphs generated from a stochastic blockmodel.

artificial intelligence, information management, stochastic blockmodel, (18 more...)

doi: 10.1214/14-AOS1285

1310.1495

Country:

North America > United States > Texas > Travis County > Austin (0.14)
North America > United States > California > Alameda County > Berkeley (0.14)

Genre: Research Report (0.82)

Industry: Information Technology (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
(2 more...)

Sussman, Daniel L., Volfovsky, Alexander, Airoldi, Edoardo M.

Analyzing statistical and computational tradeoffs of estimation procedures

arXiv.org Machine LearningJun-25-2015

The recent explosion in the amount and dimensionality of data has exacerbated the need of trading off computational and statistical efficiency carefully, so that inference is both tractable and meaningful. We propose a framework that provides an explicit opportunity for practitioners to specify how much statistical risk they are willing to accept for a given computational cost, and leads to a theoretical risk-computation frontier for any given inference problem. We illustrate the tradeoff between risk and computation and illustrate the frontier in three distinct settings. First, we derive analytic forms for the risk of estimating parameters in the classical setting of estimating the mean and variance for normally distributed data and for the more general setting of parameters of an exponential family. The second example concentrates on computationally constrained Hodges-Lehmann estimators. We conclude with an evaluation of risk associated with early termination of iterative matrix inversion algorithms in the context of linear regression.

algorithm, artificial intelligence, machine learning, (18 more...)

1506.07925

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Dubuisson, Jimmy, Eckmann, Jean-Pierre, Agazzi, Andrea

Diffusion Fingerprints

arXiv.org Machine LearningJun-25-2015

We introduce, test and discuss a method for classifying and clustering data modeled as directed graphs. The idea is to start diffusion processes from any subset of a data collection, generating corresponding distributions for reaching points in the network. These distributions take the form of high-dimensional numerical vectors and capture essential topological properties of the original dataset. We show how these diffusion vectors can be successfully applied for getting state-of-the-art accuracies in the problem of extracting pathways from metabolic networks. We also provide a guideline to illustrate how to use our method for classification problems, and discuss important details of its implementation. In particular, we present a simple dimensionality reduction technique that lowers the computational cost of classifying diffusion vectors, while leaving the predictive power of the classification process substantially unaltered. Although the method has very few parameters, the results we obtain show its flexibility and power. This should make it helpful in many other contexts.

graph, health & medicine, information management, (22 more...)

1408.4966

Country: North America > United States (1.00)

Genre: Research Report (0.64)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.69)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)