AITopics

PAQ8 is an open source lossless data compression algorithm that currently achieves the best compression rates on many benchmarks. This report presents a detailed description of PAQ8 from a statistical machine learning perspective. It shows that it is possible to understand some of the modules of PAQ8 and use this understanding to improve the method. However, intuitive statistical explanations of the behavior of other modules remain elusive. We hope the description in this report will be a starting point for discussions that will increase our understanding, lead to improvements to PAQ8, and facilitate a transfer of knowledge from PAQ8 to other machine learning methods, such a recurrent neural networks and stochastic memoizers. Finally, the report presents a broad range of new applications of PAQ to machine learning tasks including language modeling and adaptive text prediction, adaptive game playing, classification, and compression using features from the field of deep learning.

algorithm, prediction, sequence, (17 more...)

1108.3298

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > Canada > Alberta > Census Division No. 6 > Calgary Metropolitan Region > Calgary (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry: Law > Litigation (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Lázaro-Gredilla, Miguel, Van Vaerenbergh, Steven, Lawrence, Neil

Overlapping Mixtures of Gaussian Processes for the Data Association Problem

In this work we introduce a mixture of GPs to address the data association problem, i.e. to label a group of observations according to the sources that generated them. Unlike several previously proposed GP mixtures, the novel mixture has the distinct characteristic of using no gating function to determine the association of samples and mixture components. Instead, all the GPs in the mixture are global and samples are clustered following "trajectories" across input space. We use a nonstandard variational Bayesian algorithm to efficiently recover sample labels and learn the hyperparameters. We show how multi-object tracking problems can be disambiguated and also explore the characteristics of the model in traditional regression settings. Keywords: Gaussian Processes, Marginalized Variational Inference, Bayesian Models 1. Introduction The data association problem arises in multi-target tracking scenarios.

artificial intelligence, machine learning, trajectory, (18 more...)

1108.3372

Country:

North America > United States (0.68)
Europe (0.68)

Genre: Research Report (0.50)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

Zhang, Zhilin, Rao, Bhaskar D.

Sparse Signal Recovery with Temporally Correlated Source Vectors Using Sparse Bayesian Learning

We address the sparse signal recovery problem in the context of multiple measurement vectors (MMV) when elements in each nonzero row of the solution matrix are temporally correlated. Existing algorithms do not consider such temporal correlations and thus their performance degrades significantly with the correlations. In this work, we propose a block sparse Bayesian learning framework which models the temporal correlations. In this framework we derive two sparse Bayesian learning (SBL) algorithms, which have superior recovery performance compared to existing algorithms, especially in the presence of high temporal correlations. Furthermore, our algorithms are better at handling highly underdetermined problems and require less row-sparsity on the solution matrix. We also provide analysis of the global and local minima of their cost function, and show that the SBL cost function has the very desirable property that the global minimum is at the sparsest solution to the MMV problem. Extensive experiments also provide some interesting results that motivate future theoretical research on the MMV model.

artificial intelligence, bayesian inference, machine learning, (16 more...)

doi: 10.1109/JSTSP.2011.2159773

1102.3949

Country:

North America > United States > California (0.93)
Europe (0.67)

Genre: Research Report > Experimental Study (0.34)

Industry:

Health & Medicine > Diagnostic Medicine (0.46)
Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Fox, Emily B., Sudderth, Erik B., Jordan, Michael I., Willsky, Alan S.

A sticky HDP-HMM with application to speaker diarization

We consider the problem of speaker diarization, the problem of segmenting an audio recording of a meeting into temporal segments corresponding to individual speakers. The problem is rendered particularly difficult by the fact that we are not allowed to assume knowledge of the number of people participating in the meeting. To address this problem, we take a Bayesian nonparametric approach to speaker diarization that builds on the hierarchical Dirichlet process hidden Markov model (HDP-HMM) of Teh et al. [J. Amer. Statist. Assoc. 101 (2006) 1566--1581]. Although the basic HDP-HMM tends to over-segment the audio data---creating redundant states and rapidly switching among them---we describe an augmented HDP-HMM that provides effective control over the switching rate. We also show that this augmentation makes it possible to treat emission distributions nonparametrically. To scale the resulting architecture to realistic diarization problems, we develop a sampling algorithm that employs a truncated approximation of the Dirichlet process to jointly resample the full state sequence, greatly improving mixing rates. Working with a benchmark NIST data set, we show that our Bayesian nonparametric architecture yields state-of-the-art speaker diarization results.

artificial intelligence, machine learning, natural language, (19 more...)

doi: 10.1214/10-AOAS395

0905.2592

Country:

Asia (0.94)
North America > United States > California (0.68)
Europe (0.67)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.28)

Genre:

Research Report (1.00)
Instructional Material > Course Syllabus & Notes (0.46)

Industry:

Media > Music (0.48)
Leisure & Entertainment (0.48)
Consumer Products & Services > Restaurants (0.47)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Huszár, Ferenc, Lacoste-Julien, Simon

A Kernel Approach to Tractable Bayesian Nonparametrics

arXiv.org Machine LearningAug-12-2011

Inference in popular nonparametric Bayesian models typically relies on sampling or other approximations. This paper presents a general methodology for constructing novel tractable nonparametric Bayesian methods by applying the kernel trick to inference in a parametric Bayesian model. For example, Gaussian process regression can be derived this way from Bayesian linear regression. Despite the success of the Gaussian process framework, the kernel trick is rarely explicitly considered in the Bayesian literature. In this paper, we aim to fill this gap and demonstrate the potential of applying the kernel trick to tractable Bayesian parametric models in a wider context than just regression. As an example, we present an intuitive Bayesian kernel machine for density estimation that is obtained by applying the kernel trick to a Gaussian generative model in feature space.

artificial intelligence, bayesian inference, machine learning, (15 more...)

1103.1761

Country: Europe > United Kingdom > England (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

arXiv.org Artificial IntelligenceAug-11-2011

Limits of Preprocessing

Szeider, Stefan

Many important computational problems that arise in various areas of AI are intractable. Nevertheless, AI research was very successful in developing and implementing heuristic solvers that work well on realworld instances. An important component of virtually every solver is a powerful polynomial-time preprocessing procedure that reduces the problem input. For instance, preprocessing techniques for the propositional satisfiability problem are based on Boolean Constraint Propagation (see, e.g., Eén and Biere, 2005), CSP solvers make use of various local consistency algorithms that filter the domains of variables (see, e.g., Bessière, 2006); similar preprocessing methods are used by solvers for Nonmonotonic and Bayesian reasoning problems (see, e.g., Gebser et al., 2008, Bolt and van der Gaag, 2006, respectively). Until recently, no provable performance guarantees for polynomial-time preprocessing methods have been obtained, and so preprocessing was only subject of empirical studies. A possible reason for the lack of theoretical results is a certain inadequacy of the P vs NP framework for such an analysis: if we could reduce in polynomial time an instance of an NPhard problem just by one bit, then we can solve the entire problem in polynomial time by repeating the reduction step a polynomial number of times, and P NP follows. With the advent of parameterized complexity (Downey, Fellows, and Stege, 1999), a new theoretical framework became available that provides suitable tools to analyze the power of preprocessing. Parameterized complexity considers a problem in a two-dimensional setting, where in addition to the input size n, a problem parameter k is taken into consideration.

artificial intelligence, logic & formal reasoning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

1104.5566

Country:

Europe (1.00)
North America > United States > California (0.28)
North America > Canada > British Columbia (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.47)

Finegold, Michael, Drton, Mathias

Robust graphical modeling of gene networks using classical and alternative T-distributions

arXiv.org Machine LearningAug-9-2011

Graphical Gaussian models have proven to be useful tools for exploring network structures based on multivariate data. Applications to studies of gene expression have generated substantial interest in these models, and resulting recent progress includes the development of fitting methodology involving penalization of the likelihood function. In this paper we advocate the use of multivariate $t$-distributions for more robust inference of graphs. In particular, we demonstrate that penalized likelihood inference combined with an application of the EM algorithm provides a computationally efficient approach to model selection in the $t$-distribution case. We consider two versions of multivariate $t$-distributions, one of which requires the use of approximation techniques. For this distribution, we describe a Markov chain Monte Carlo EM algorithm based on a Gibbs sampler as well as a simple variational approximation that makes the resulting method feasible in large problems.

artificial intelligence, machine learning, procedure, (19 more...)

doi: 10.1214/10-AOAS410

1009.3669

Country: North America > United States (0.68)

Genre: Research Report (0.82)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Blei, David M., Frazier, Peter I.

Distance Dependent Chinese Restaurant Processes

arXiv.org Machine LearningAug-9-2011

We develop the distance dependent Chinese restaurant process (CRP), a flexible class of distributions over partitions that allows for non-exchangeability. This class can be used to model many kinds of dependencies between data in infinite clustering models, including dependencies across time or space. We examine the properties of the distance dependent CRP, discuss its connections to Bayesian nonparametric mixture models, and derive a Gibbs sampler for both observed and mixture settings. We study its performance with three text corpora. We show that relaxing the assumption of exchangeability with distance dependent CRPs can provide a better fit to sequential data. We also show its alternative formulation of the traditional CRP leads to a faster-mixing Gibbs sampling algorithm than the one based on the original formulation.

distance dependent crp, machine learning, natural language, (17 more...)

0910.1022

Country: North America > United States (0.93)

Genre: Research Report (0.81)

Industry: Consumer Products & Services > Restaurants (0.72)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

AAAI ConferencesAug-8-2011

Visualizing and Understanding Large-Scale Bayesian Networks

Cossalter, Michele (Carnegie Mellon University) | Mengshoel, Ole (Carnegie Mellon University) | Selker, Ted (Carnegie Mellon University)

Bayesian networks are a theoretically well-founded approach to represent large multi-variate probability distributions, and have proven useful in a broad range of applications. While several software tools for visualizing and editing Bayesian networks exist, they have important weaknesses when it comes to enabling users to clearly understand and compare conditional probability tables in the context of network topology, especially in large-scale networks. This paper describes a system for improving the ability for computers to work with people to develop intelligent systems through the construction of high-performing Bayesian networks. We describe NetEx, a tool developed as a Cytoscape plug-in, which allows a user to visually inspect and compare details concerning multiple nodes in a Bayesian network while maintaining awareness of their network context. It uses a "thought bubble line" to connect nodes in a graph representation and their internal information at the side of the graph. The tool seeks to improve the ability of experts to analyze and debug large Bayesian network models, and to help people to understand how alternative algorithms and Bayesian networks operate, providing insights into how to improve them.

artificial intelligence, bayesian network, machine learning, (17 more...)

AAAI Conferences

Workshops at the Twenty-Fifth AAAI Conference on Artificial Intelligence

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > Netherlands > South Holland > Delft (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > California > San Mateo County > San Mateo (0.04)

Industry: Health & Medicine > Therapeutic Area (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

AAAI ConferencesAug-8-2011

Towards Detection of Suspicious Behavior from Multiple Observations

Kaluza, Bostjan (Jozef Stefan Institute) | Kaminka, Gal (Bar Ilan University) | Tambe, Milind (University of Southern California)

This paper addresses the problem of detecting suspicious behavior from a collection of individuals events, where no single event is enough to decide whether his/her behavior is suspicious, but the combination of multiple events enables reasoning. We establish a Bayesian framework for evaluating multiple events and show that the current approaches lack modeling behavior history included in the estimation whether a trace of events is generated by a suspicious agent. We propose a heuristic for evaluating events according to the behavior of the agent in the past. The proposed approach, tested on an airport domain, outperforms the current approaches.

agent, detection, probability, (14 more...)

AAAI Conferences

Workshops at the Twenty-Fifth AAAI Conference on Artificial Intelligence

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > Slovenia > Central Slovenia > Municipality of Ljubljana > Ljubljana (0.04)
Asia > Middle East > Israel (0.04)

Industry:

Information Technology > Security & Privacy (1.00)
Commercial Services & Supplies > Security & Alarm Services (1.00)
Transportation > Infrastructure & Services (0.69)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.68)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)