AITopics

Taieb, Souhaib Ben, Bontempi, Gianluca, Atiya, Amir, Sorjamaa, Antti

A review and comparison of strategies for multi-step ahead time series forecasting based on the NN5 forecasting competition

Multi-step ahead forecasting is still an open challenge in time series forecasting. Several approaches that deal with this complex problem have been proposed in the literature but an extensive comparison on a large number of tasks is still missing. This paper aims to fill this gap by reviewing existing strategies for multi-step ahead forecasting and comparing them in theoretical and practical terms. To attain such an objective, we performed a large scale comparison of these different strategies using a large experimental benchmark (namely the 111 series from the NN5 forecasting competition). In addition, we considered the effects of deseasonalization, input variable selection, and forecast combination on these strategies and on multi-step ahead forecasting at large. The following three findings appear to be consistently supported by the experimental results: Multiple-Output strategies are the best performing approaches, deseasonalization leads to uniformly improved forecast accuracy, and input selection is more effective when performed in conjunction with deseasonalization.

forecasting strategy, fuzzy logic, survey article, (20 more...)

1108.3259

Country:

Europe (1.00)
Africa > Middle East > Egypt (0.28)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre:

Workflow (0.93)
Research Report > New Finding (0.92)

Industry: Government (0.46)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.45)

Knoll, Byron, de Freitas, Nando

A Machine Learning Perspective on Predictive Coding with PAQ

PAQ8 is an open source lossless data compression algorithm that currently achieves the best compression rates on many benchmarks. This report presents a detailed description of PAQ8 from a statistical machine learning perspective. It shows that it is possible to understand some of the modules of PAQ8 and use this understanding to improve the method. However, intuitive statistical explanations of the behavior of other modules remain elusive. We hope the description in this report will be a starting point for discussions that will increase our understanding, lead to improvements to PAQ8, and facilitate a transfer of knowledge from PAQ8 to other machine learning methods, such a recurrent neural networks and stochastic memoizers. Finally, the report presents a broad range of new applications of PAQ to machine learning tasks including language modeling and adaptive text prediction, adaptive game playing, classification, and compression using features from the field of deep learning.

algorithm, deep learning, neural network, (21 more...)

1108.3298

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > Canada > British Columbia (0.14)

Genre: Research Report (1.00)

Industry: Law > Litigation (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Zhang, Zhilin, Rao, Bhaskar D.

Sparse Signal Recovery with Temporally Correlated Source Vectors Using Sparse Bayesian Learning

We address the sparse signal recovery problem in the context of multiple measurement vectors (MMV) when elements in each nonzero row of the solution matrix are temporally correlated. Existing algorithms do not consider such temporal correlations and thus their performance degrades significantly with the correlations. In this work, we propose a block sparse Bayesian learning framework which models the temporal correlations. In this framework we derive two sparse Bayesian learning (SBL) algorithms, which have superior recovery performance compared to existing algorithms, especially in the presence of high temporal correlations. Furthermore, our algorithms are better at handling highly underdetermined problems and require less row-sparsity on the solution matrix. We also provide analysis of the global and local minima of their cost function, and show that the SBL cost function has the very desirable property that the global minimum is at the sparsest solution to the MMV problem. Extensive experiments also provide some interesting results that motivate future theoretical research on the MMV model.

algorithm, bayesian inference, neurology, (19 more...)

doi: 10.1109/JSTSP.2011.2159773

1102.3949

Country:

Europe (0.67)
North America > United States > California > San Diego County (0.14)

Genre: Research Report > Experimental Study (0.34)

Industry:

Health & Medicine > Diagnostic Medicine (0.46)
Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Lázaro-Gredilla, Miguel, Van Vaerenbergh, Steven, Lawrence, Neil

Overlapping Mixtures of Gaussian Processes for the Data Association Problem

In this work we introduce a mixture of GPs to address the data association problem, i.e. to label a group of observations according to the sources that generated them. Unlike several previously proposed GP mixtures, the novel mixture has the distinct characteristic of using no gating function to determine the association of samples and mixture components. Instead, all the GPs in the mixture are global and samples are clustered following "trajectories" across input space. We use a non-standard variational Bayesian algorithm to efficiently recover sample labels and learn the hyperparameters. We show how multi-object tracking problems can be disambiguated and also explore the characteristics of the model in traditional regression settings.

artificial intelligence, machine learning, trajectory, (16 more...)

1108.3372

Country:

North America > United States (0.68)
Europe > Spain > Cantabria (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Communications > Networks (0.94)

Li, Ping, Shrivastava, Anshumali, Konig, Christian

Training Logistic Regression and SVM on 200GB Data Using b-Bit Minwise Hashing and Comparisons with Vowpal Wabbit (VW)

arXiv.org Machine LearningAug-15-2011

We generated a dataset of 200 GB with 10^9 features, to test our recent b-bit minwise hashing algorithms for training very large-scale logistic regression and SVM. The results confirm our prior work that, compared with the VW hashing algorithm (which has the same variance as random projections), b-bit minwise hashing is substantially more accurate at the same storage. For example, with merely 30 hashed values per data point, b-bit minwise hashing can achieve similar accuracies as VW with 2^14 hashed values per data point. We demonstrate that the preprocessing cost of b-bit minwise hashing is roughly on the same order of magnitude as the data loading time. Furthermore, by using a GPU, the preprocessing cost can be reduced to a small fraction of the data loading time. Minwise hashing has been widely used in industry, at least in the context of search. One reason for its popularity is that one can efficiently simulate permutations by (e.g.,) universal hashing. In other words, there is no need to store the permutation matrix. In this paper, we empirically verify this practice, by demonstrating that even using the simplest 2-universal hashing does not degrade the learning performance.

artificial intelligence, machine learning, minwise, (18 more...)

1108.3072

Country:

Europe (1.00)
North America > United States > California > Santa Clara County (0.28)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.62)

Mukherjee, Indraneel, Schapire, Robert E.

A theory of multiclass boosting

arXiv.org Machine LearningAug-15-2011

Boosting combines weak classifiers to form highly accurate predictors. Although the case of binary classification is well understood, in the multiclass setting, the "correct" requirements on the weak classifier, or the notion of the most efficient boosting algorithms are missing. In this paper, we create a broad and general framework, within which we make precise and identify the optimal requirements on the weak-classifier, as well as design the most effective, in a certain sense, boosting algorithms that assume such requirements.

adaboost, artificial intelligence, decision tree learning, (18 more...)

1108.2989

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.67)

Industry: Leisure & Entertainment (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.46)

Leibon, Greg, Pauls, Scott, Rockmore, Daniel N., Savell, Robert

Partition Decomposition for Roll Call Data

arXiv.org Machine LearningAug-13-2011

In this paper we bring to bear some new tools from statistical learning on the analysis of roll call data. We present a new data-driven model for roll call voting that is geometric in nature. We construct the model by adapting the "Partition Decoupling Method," an unsupervised learning technique originally developed for the analysis of families of time series, to produce a multiscale geometric description of a weighted network associated to a set of roll call votes. Central to this approach is the quantitative notion of a "motivation," a cluster-based and learned basis element that serves as a building block in the representation of roll call data. Motivations enable the formulation of a quantitative description of ideology and their data-dependent nature makes possible a quantitative analysis of the evolution of ideological factors. This approach is generally applicable to roll call data and we apply it in particular to the historical roll call voting of the U.S. House and Senate. This methodology provides a mechanism for estimating the dimension of the underlying action space. We determine that the dominant factors form a low- (one- or two-) dimensional representation with secondary factors adding higher-dimensional features. In this way our work supports and extends the findings of both Poole-Rosenthal and Heckman-Snyder concerning the dimensionality of the action space. We give a detailed analysis of several individual Senates and use the AdaBoost technique from statistical learning to determine those votes with the most powerful discriminatory value. When used as a predictive model, this geometric view significantly outperforms spatial models such as the Poole-Rosenthal DW-NOMINATE model and the Heckman-Snyder 6-factor model, both in raw accuracy as well as Aggregate Proportional Reduced Error (APRE).

artificial intelligence, dimension, us government, (18 more...)

1108.2805

Country:

Europe > United Kingdom > England (0.14)
North America > United States > Michigan (0.14)

Genre: Research Report (0.81)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Government > Voting & Elections (0.93)
Law > Statutes (0.68)
Health & Medicine (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Carreira-Perpiñán, Miguel Á., Goodhill, Geoffrey J.

Generalised elastic nets

arXiv.org Machine LearningAug-13-2011

The elastic net was introduced as a heuristic algorithm for combinatorial optimisation and has been applied, among other problems, to biological modelling. It has an energy function which trades off a fitness term against a tension term. In the original formulation of the algorithm the tension term was implicitly based on a first-order derivative. In this paper we generalise the elastic net model to an arbitrary quadratic tension term, e.g. derived from a discretised differential operator, and give an efficient learning algorithm. We refer to these as generalised elastic nets (GENs). We give a theoretical analysis of the tension term for 1D nets with periodic boundary conditions, and show that the model is sensitive to the choice of finite difference scheme that represents the discretised derivative. We illustrate some of these issues in the context of cortical map models, by relating the choice of tension term to a cortical interaction function. In particular, we prove that this interaction takes the form of a Mexican hat for the original elastic net, and of progressively more oscillatory Mexican hats for higher-order derivatives. The results apply not only to generalised elastic nets but also to other methods using discrete differential penalties, and are expected to be useful in other areas, such as data analysis, computer graphics and optimisation problems.

artificial intelligence, health & medicine, stencil, (19 more...)

1108.284

Country:

North America > United States (1.00)
North America > Canada > Ontario > Toronto (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report (0.81)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.65)

Huszár, Ferenc, Lacoste-Julien, Simon

A Kernel Approach to Tractable Bayesian Nonparametrics

arXiv.org Machine LearningAug-12-2011

Inference in popular nonparametric Bayesian models typically relies on sampling or other approximations. This paper presents a general methodology for constructing novel tractable nonparametric Bayesian methods by applying the kernel trick to inference in a parametric Bayesian model. For example, Gaussian process regression can be derived this way from Bayesian linear regression. Despite the success of the Gaussian process framework, the kernel trick is rarely explicitly considered in the Bayesian literature. In this paper, we aim to fill this gap and demonstrate the potential of applying the kernel trick to tractable Bayesian parametric models in a wider context than just regression. As an example, we present an intuitive Bayesian kernel machine for density estimation that is obtained by applying the kernel trick to a Gaussian generative model in feature space.

artificial intelligence, bayesian inference, predictive distribution, (15 more...)

1103.1761

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)