AITopics

PAQ8 is an open source lossless data compression algorithm that currently achieves the best compression rates on many benchmarks. This report presents a detailed description of PAQ8 from a statistical machine learning perspective. It shows that it is possible to understand some of the modules of PAQ8 and use this understanding to improve the method. However, intuitive statistical explanations of the behavior of other modules remain elusive. We hope the description in this report will be a starting point for discussions that will increase our understanding, lead to improvements to PAQ8, and facilitate a transfer of knowledge from PAQ8 to other machine learning methods, such a recurrent neural networks and stochastic memoizers. Finally, the report presents a broad range of new applications of PAQ to machine learning tasks including language modeling and adaptive text prediction, adaptive game playing, classification, and compression using features from the field of deep learning.

algorithm, prediction, sequence, (17 more...)

1108.3298

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > Canada > Alberta > Census Division No. 6 > Calgary Metropolitan Region > Calgary (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry: Law > Litigation (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Taieb, Souhaib Ben, Bontempi, Gianluca, Atiya, Amir, Sorjamaa, Antti

A review and comparison of strategies for multi-step ahead time series forecasting based on the NN5 forecasting competition

Multi-step ahead forecasting is still an open challenge in time series forecasting. Several approaches that deal with this complex problem have been proposed in the literature but an extensive comparison on a large number of tasks is still missing. This paper aims to fill this gap by reviewing existing strategies for multi-step ahead forecasting and comparing them in theoretical and practical terms. To attain such an objective, we performed a large scale comparison of these different strategies using a large experimental benchmark (namely the 111 series from the NN5 forecasting competition). In addition, we considered the effects of deseasonalization, input variable selection, and forecast combination on these strategies and on multi-step ahead forecasting at large. The following three findings appear to be consistently supported by the experimental results: Multiple-Output strategies are the best performing approaches, deseasonalization leads to uniformly improved forecast accuracy, and input selection is more effective when performed in conjunction with deseasonalization.

data mining, forecasting strategy, machine learning, (19 more...)

1108.3259

Country:

Europe > United Kingdom > England (0.28)
North America > United States > California (0.28)
Africa > Middle East > Egypt (0.28)

Genre:

Workflow (0.93)
Research Report > New Finding (0.92)

Industry: Government (0.46)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.45)

Lázaro-Gredilla, Miguel, Van Vaerenbergh, Steven, Lawrence, Neil

Overlapping Mixtures of Gaussian Processes for the Data Association Problem

In this work we introduce a mixture of GPs to address the data association problem, i.e. to label a group of observations according to the sources that generated them. Unlike several previously proposed GP mixtures, the novel mixture has the distinct characteristic of using no gating function to determine the association of samples and mixture components. Instead, all the GPs in the mixture are global and samples are clustered following "trajectories" across input space. We use a nonstandard variational Bayesian algorithm to efficiently recover sample labels and learn the hyperparameters. We show how multi-object tracking problems can be disambiguated and also explore the characteristics of the model in traditional regression settings. Keywords: Gaussian Processes, Marginalized Variational Inference, Bayesian Models 1. Introduction The data association problem arises in multi-target tracking scenarios.

artificial intelligence, machine learning, trajectory, (18 more...)

1108.3372

Country:

North America > United States (0.68)
Europe (0.68)

Genre: Research Report (0.50)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

Zhang, Zhilin, Rao, Bhaskar D.

Sparse Signal Recovery with Temporally Correlated Source Vectors Using Sparse Bayesian Learning

We address the sparse signal recovery problem in the context of multiple measurement vectors (MMV) when elements in each nonzero row of the solution matrix are temporally correlated. Existing algorithms do not consider such temporal correlations and thus their performance degrades significantly with the correlations. In this work, we propose a block sparse Bayesian learning framework which models the temporal correlations. In this framework we derive two sparse Bayesian learning (SBL) algorithms, which have superior recovery performance compared to existing algorithms, especially in the presence of high temporal correlations. Furthermore, our algorithms are better at handling highly underdetermined problems and require less row-sparsity on the solution matrix. We also provide analysis of the global and local minima of their cost function, and show that the SBL cost function has the very desirable property that the global minimum is at the sparsest solution to the MMV problem. Extensive experiments also provide some interesting results that motivate future theoretical research on the MMV model.

artificial intelligence, bayesian inference, machine learning, (16 more...)

doi: 10.1109/JSTSP.2011.2159773

1102.3949

Country:

North America > United States > California (0.93)
Europe (0.67)

Genre: Research Report > Experimental Study (0.34)

Industry:

Health & Medicine > Diagnostic Medicine (0.46)
Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Fox, Emily B., Sudderth, Erik B., Jordan, Michael I., Willsky, Alan S.

A sticky HDP-HMM with application to speaker diarization

We consider the problem of speaker diarization, the problem of segmenting an audio recording of a meeting into temporal segments corresponding to individual speakers. The problem is rendered particularly difficult by the fact that we are not allowed to assume knowledge of the number of people participating in the meeting. To address this problem, we take a Bayesian nonparametric approach to speaker diarization that builds on the hierarchical Dirichlet process hidden Markov model (HDP-HMM) of Teh et al. [J. Amer. Statist. Assoc. 101 (2006) 1566--1581]. Although the basic HDP-HMM tends to over-segment the audio data---creating redundant states and rapidly switching among them---we describe an augmented HDP-HMM that provides effective control over the switching rate. We also show that this augmentation makes it possible to treat emission distributions nonparametrically. To scale the resulting architecture to realistic diarization problems, we develop a sampling algorithm that employs a truncated approximation of the Dirichlet process to jointly resample the full state sequence, greatly improving mixing rates. Working with a benchmark NIST data set, we show that our Bayesian nonparametric architecture yields state-of-the-art speaker diarization results.

artificial intelligence, machine learning, natural language, (19 more...)

doi: 10.1214/10-AOAS395

0905.2592

Country:

Asia (0.94)
North America > United States > California (0.68)
Europe (0.67)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.28)

Genre:

Research Report (1.00)
Instructional Material > Course Syllabus & Notes (0.46)

Industry:

Media > Music (0.48)
Leisure & Entertainment (0.48)
Consumer Products & Services > Restaurants (0.47)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

arXiv.org Artificial IntelligenceAug-16-2011

Finding Similar/Diverse Solutions in Answer Set Programming

Eiter, Thomas, Erdem, Esra, Erdogan, Halit, Fink, Michael

For some computational problems (e.g., product configuration, planning, diagnosis, query answering, phylogeny reconstruction) computing a set of similar/diverse solutions may be desirable for better decision-making. With this motivation, we studied several decision/optimization versions of this problem in the context of Answer Set Programming (ASP), analyzed their computational complexity, and introduced offline/online methods to compute similar/diverse solutions of such computational problems with respect to a given distance function. All these methods rely on the idea of computing solutions to a problem by means of finding the answer sets for an ASP program that describes the problem. The offline methods compute all solutions in advance using the ASP formulation of the problem with an ASP solver, like Clasp, and then identify similar/diverse solutions using clustering methods. The online methods compute similar/diverse solutions following one of the three approaches: by reformulating the ASP representation of the problem to compute similar/diverse solutions at once using an ASP solver; by computing similar/diverse solutions iteratively (one after other) using an ASP solver; by modifying the search algorithm of an ASP solver to compute similar/diverse solutions incrementally. We modified Clasp to implement the last online method and called it Clasp-NK. In the first two online methods, the given distance function is represented in ASP; in the last one it is implemented in C++. We showed the applicability and the effectiveness of these methods on reconstruction of similar/diverse phylogenies for Indo-European languages, and on several planning problems in Blocks World. We observed that in terms of computational efficiency the last online method outperforms the others; also it allows us to compute similar/diverse solutions when the distance function cannot be represented in ASP.

artificial intelligence, logic & formal reasoning, logic programming, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1017/S1471068411000548

1108.326

Country:

Europe > Austria (0.28)
Asia > Middle East > Republic of Türkiye (0.28)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)

Li, Ping, Shrivastava, Anshumali, Konig, Christian

Training Logistic Regression and SVM on 200GB Data Using b-Bit Minwise Hashing and Comparisons with Vowpal Wabbit (VW)

arXiv.org Machine LearningAug-15-2011

We generated a dataset of 200 GB with 10^9 features, to test our recent b-bit minwise hashing algorithms for training very large-scale logistic regression and SVM. The results confirm our prior work that, compared with the VW hashing algorithm (which has the same variance as random projections), b-bit minwise hashing is substantially more accurate at the same storage. For example, with merely 30 hashed values per data point, b-bit minwise hashing can achieve similar accuracies as VW with 2^14 hashed values per data point. We demonstrate that the preprocessing cost of b-bit minwise hashing is roughly on the same order of magnitude as the data loading time. Furthermore, by using a GPU, the preprocessing cost can be reduced to a small fraction of the data loading time. Minwise hashing has been widely used in industry, at least in the context of search. One reason for its popularity is that one can efficiently simulate permutations by (e.g.,) universal hashing. In other words, there is no need to store the permutation matrix. In this paper, we empirically verify this practice, by demonstrating that even using the simplest 2-universal hashing does not degrade the learning performance.

b-bit minwise, dataset, minwise, (15 more...)

1108.3072

Country:

North America > United States > New York > Tompkins County > Ithaca (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)
(13 more...)

Genre: Research Report > New Finding (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.62)

Carreira-Perpiñán, Miguel Á., Goodhill, Geoffrey J.

Generalised elastic nets

arXiv.org Machine LearningAug-13-2011

The elastic net was introduced as a heuristic algorithm for combinatorial optimisation and has been applied, among other problems, to biological modelling. It has an energy function which trades off a fitness term against a tension term. In the original formulation of the algorithm the tension term was implicitly based on a first-order derivative. In this paper we generalise the elastic net model to an arbitrary quadratic tension term, e.g. derived from a discretised differential operator, and give an efficient learning algorithm. We refer to these as generalised elastic nets (GENs). We give a theoretical analysis of the tension term for 1D nets with periodic boundary conditions, and show that the model is sensitive to the choice of finite difference scheme that represents the discretised derivative. We illustrate some of these issues in the context of cortical map models, by relating the choice of tension term to a cortical interaction function. In particular, we prove that this interaction takes the form of a Mexican hat for the original elastic net, and of progressively more oscillatory Mexican hats for higher-order derivatives. The results apply not only to generalised elastic nets but also to other methods using discrete differential penalties, and are expected to be useful in other areas, such as data analysis, computer graphics and optimisation problems.

artificial intelligence, machine learning, stencil, (18 more...)

1108.284

Country:

North America > United States (1.00)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.27)

Genre: Research Report (0.81)

Industry:

Health & Medicine (0.67)
Education (0.45)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.65)

Huszár, Ferenc, Lacoste-Julien, Simon

A Kernel Approach to Tractable Bayesian Nonparametrics

arXiv.org Machine LearningAug-12-2011

Inference in popular nonparametric Bayesian models typically relies on sampling or other approximations. This paper presents a general methodology for constructing novel tractable nonparametric Bayesian methods by applying the kernel trick to inference in a parametric Bayesian model. For example, Gaussian process regression can be derived this way from Bayesian linear regression. Despite the success of the Gaussian process framework, the kernel trick is rarely explicitly considered in the Bayesian literature. In this paper, we aim to fill this gap and demonstrate the potential of applying the kernel trick to tractable Bayesian parametric models in a wider context than just regression. As an example, we present an intuitive Bayesian kernel machine for density estimation that is obtained by applying the kernel trick to a Gaussian generative model in feature space.

artificial intelligence, bayesian inference, machine learning, (15 more...)

1103.1761

Country: Europe > United Kingdom > England (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Gorst-Rasmussen, Anders, Scheike, Thomas H.

Independent screening for single-index hazard rate models with ultra-high dimensional features

arXiv.org Machine LearningAug-11-2011

In data sets with many more features than observations, independent screening based on all univariate regression models leads to a computationally convenient variable selection method. Recent efforts have shown that in the case of generalized linear models, independent screening may suffice to capture all relevant features with high probability, even in ultra-high dimension. It is unclear whether this formal sure screening property is attainable when the response is a right-censored survival time. We propose a computationally very efficient independent screening method for survival data which can be viewed as the natural survival equivalent of correlation screening. We state conditions under which the method admits the sure screening property within a general class of single-index hazard rate models with ultra-high dimensional features. An iterative variant is also described which combines screening with penalized regression in order to handle more complex feature covariance structures. The methods are evaluated through simulation studies and through application to a real gene expression dataset.

artificial intelligence, machine learning, screening, (17 more...)

1105.3361

Country: Europe > Denmark (0.28)

Genre: Research Report > Experimental Study (0.93)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)