AITopics

Most machine learning researchers perform quantitative experiments to estimate generalization error and compare algorithm performances. In order to draw statistically convincing conclusions, it is important to estimate the uncertainty of such estimates. This paper studies the estimation of uncertainty around the K-fold cross-validation estimator. The main theorem shows that there exists no universal unbiased estimator of the variance of K-fold cross-validation. An analysis based on the eigendecomposition of the covariance matrix of errors helps to better understand the nature of the problem and shows that naive estimators may grossly underestimate variance, as con£rmed by numerical experiments.

estimator, k-fold cross-validation, variance, (15 more...)

Country:

North America > Canada > Quebec > Montreal (0.04)
Europe > Hungary > Győr-Moson-Sopron County > Győr (0.04)

Genre: Research Report (0.89)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.87)

Roth, Volker, Lange, Tilman

Feature Selection in Clustering Problems

A novel approach to combining clustering and feature selection is presented. It implements a wrapper strategy for feature selection, in the sense that the features are directly selected by optimizing the discriminative power of the used partitioning algorithm. On the technical side, we present an efficient optimization algorithm with guaranteed local convergence property. The only free parameter of this method is selected by a resampling-based stability analysis. Experiments with real-world datasets demonstrate that our method is able to infer both meaningful partitions and meaningful subsets of features.

algorithm, partition, selection, (14 more...)

Country:

Europe > Switzerland > Zürich > Zürich (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)

Genre:

Research Report (0.48)
Overview (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Graepel, Thore, Herbrich, Ralf, Kharechko, Andriy, Shawe-taylor, John S.

Semi-Definite Programming by Perceptron Learning

We present a modified version of the perceptron learning algorithm (PLA) which solves semidefinite programs (SDPs) in polynomial time. The algorithm is based on the following three observations: (i) Semidefinite programs are linear programs with infinitely many (linear) constraints; (ii) every linear program can be solved by a sequence of constraint satisfaction problems with linear constraints; (iii) in general, the perceptron learning algorithm solves a constraint satisfaction problem with linear constraints in finitely many updates. Combining the PLA with a probabilistic rescaling algorithm (which, on average, increases the size of the feasable region) results in a probabilistic algorithm for solving SDPs that runs in polynomial time. We present preliminary results which demonstrate that the algorithm works, but is not competitive with state-of-the-art interior point methods.

algorithm, iteration, perceptron, (13 more...)

Country:

North America > United States > New York (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Belgium (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.97)

Weston, Jason, Schölkopf, Bernhard, Bakir, Gökhan H.

Learning to Find Pre-Images

We consider the problem of reconstructing patterns from a feature map. Learning algorithms using kernels to operate in a reproducing kernel Hilbert space (RKHS) express their solutions in terms of input points mapped into the RKHS. We introduce a technique based on kernel principal component analysis and regression to reconstruct corresponding patterns in the input space (aka pre-images) and review its performance in several applications requiring the construction of pre-images. The introduced technique avoids difficult and/or unstable numerical optimization, is easy to implement and, unlike previous methods, permits the computation of pre-images in discrete input spaces.

application, kernel, kpca, (14 more...)

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
North America > United States > California > San Mateo County > San Mateo (0.04)
Europe > United Kingdom > England > Nottinghamshire > Nottingham (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Generalised Propagation for Fast Fourier Transforms with Partial or Missing Data

Storkey, Amos J.

Discrete Fourier transforms and other related Fourier methods have been practically implementable due to the fast Fourier transform (FFT). However there are many situations where doing fast Fourier transforms without complete data would be desirable. In this paper it is recognised that formulating the FFT algorithm as a belief network allows suitable priors to be set for the Fourier coefficients. Furthermore efficient generalised belief propagation methods between clusters of four nodes enable the Fourier coefficients to be inferred and the missing data to be estimated in near to O(n log n) time, where n is the total of the given and missing data points. This method is compared with a number of common approaches such as setting missing data to zero or to interpolation. It is tested on generated data and for a Fourier analysis of a damaged audio signal.

fourier transform, node, propagation, (16 more...)

Country:

North America > United States > California (0.04)
Europe > Netherlands (0.04)
Europe > Belgium (0.04)

Technology:

Information Technology > Data Science > Data Quality > Data Transformation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.53)

Welling, Max, Mnih, Andriy, Hinton, Geoffrey E.

Wormholes Improve Contrastive Divergence

In models that define probabilities via energies, maximum likelihood learning typically involves using Markov Chain Monte Carlo to sample from the model's distribution. If the Markov chain is started at the data distribution, learning often works well even if the chain is only run for a few time steps [3]. But if the data distribution contains modes separated by regions of very low density, brief MCMC will not ensure that different modes have the correct relative energies because it cannot move particles from one mode to another. We show how to improve brief MCMC by allowing long-range moves that are suggested by the data distribution. If the model is approximately correct, these long-range moves have a reasonable acceptance rate.

markov chain, probability, wormhole, (16 more...)

Country:

North America > Canada > Ontario > Toronto (0.15)
North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
Europe > Norway > Central Norway > Trøndelag > Trondheim (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.78)

Morris, Quaid D., Frey, Brendan J.

Denoising and Untangling Graphs Using Degree Priors

This paper addresses the problem of untangling hidden graphs from a set of noisy detections of undirected edges. We present a model of the generation of the observed graph that includes degree-based structure priors on the hidden graphs. Exact inference in the model is intractable; we present an efficient approximate inference algorithm to compute edge appearance posteriors. We evaluate our model and algorithm on a biological graph inference problem.

algorithm, graph, interaction, (17 more...)

Country:

North America > Canada > Ontario > Toronto (0.15)
Asia > Middle East > Jordan (0.04)
North America > United States > California > San Mateo County > San Mateo (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)

Beygelzimer, Alina, Rish, Irina

Approximability of Probability Distributions

We consider the question of how well a given distribution can be approximated with probabilistic graphical models. We introduce a new parameter, effective treewidth, that captures the degree of approximability as a tradeoff between the accuracy and the complexity of approximation. We present a simple approach to analyzing achievable tradeoffs that exploits the threshold behavior of monotone graph properties, and provide experimental results that support the approach.

complexity, graph, treewidth, (17 more...)

Country:

North America > United States > New York (0.04)
Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.70)

Heskes, Tom, Zoeter, Onno, Wiegerinck, Wim

Approximate Expectation Maximization

The E-step boils down to computing probabilities of the hidden variables given the observed variables (evidence) and current set of parameters. The M-step then, given these probabilities, yields a new set of parameters guaranteed to increase the likelihood. In Bayesian networks, that will be the focus of this article, the M-step is usually relatively straightforward. A complication may arise in the E-step, when computing the probability of the hidden variables given the evidence becomes intractable. An often used approach is to replace the exact yet intractable inference in the E step with approximate inference, either through sampling or using a deterministic variational method. The use of a "mean-field" variational method in this context leads to an algorithm known as variational EM and can be given theinterpretation of minimizing a free energy with respect to both a tractable approximate distribution (approximate E-step) and the parameters (M-step) [2]. Loopy belief propagation [3] and variants thereof, such as generalized belief propagation [4] and expectation propagation [5], have become popular alternatives to the "mean-field" variational approaches, often yielding somewhat better approximations. And indeed, they can and have been applied for approximate inference in the E-step of the EM algorithm (see e.g.

algorithm, belief propagation, propagation, (15 more...)

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Netherlands > Gelderland > Nijmegen (0.05)
Asia > Middle East > Jordan (0.05)
Europe > Netherlands > South Holland > Dordrecht (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.90)

Borodin, Allan, El-Yaniv, Ran, Gogan, Vincent

Can We Learn to Beat the Best Stock

A novel algorithm for actively trading stocks is presented. While traditional universal algorithms (and technical trading heuristics) attempt to predict winners or trends, our approach relies on predictable statistical relations between all pairs of stocks in the market. Our empirical results on historical markets provide strong evidence that this type of technical trading can "beat the market" and moreover, can beat the best stock in the market. In doing so we utilize a new idea for smoothing critical parameters in the context of expert learning.

algorithm, dataset, portfolio, (13 more...)

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Israel (0.04)

Industry: Banking & Finance > Trading (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.69)