AITopics

A wavelet basis selection procedure is presented for wavelet regression. Both the basis and threshold are selected using crossvalidation. The method includes the capability of incorporating prior knowledge on the smoothness (or shape of the basis functions) into the basis selection procedure. The results of the method are demonstrated using widely published sampled functions. The results of the method are contrasted with other basis function based methods.

coefficient, threshold, wavelet, (12 more...)

Country:

North America > United States > South Carolina (0.05)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.05)
North America > United States > Ohio > Lucas County > Toledo (0.04)

Industry: Government (0.30)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Quality > Data Transformation (0.34)

The Bias-Variance Tradeoff and the Randomized GACV

Wahba, Grace, Lin, Xiwu, Gao, Fangyu, Xiang, Dong, Klein, Ronald, Klein, Barbara

We propose a new in-sample cross validation based method (randomized GACV) for choosing smoothing or bandwidth parameters that govern the bias-variance or fit-complexity tradeoff in'soft' classification. Soft classification refers to a learning procedure which estimates the probability that an example with a given attribute vector is in class 1 vs class O. The target for optimizing the the tradeoff is the Kullback-Liebler distance between the estimated probability distribution and the'true' probability distribution, representing knowledge of an infinite population. The method uses a randomized estimate of the trace of a Hessian and mimics cross validation at the cost of a single relearning with perturbed outcome data.

classification, minimizer, wahba, (16 more...)

Country:

North America > United States > Wisconsin > Dane County > Madison (0.15)
North America > United States > Illinois > Cook County > Chicago (0.05)
North America > United States > North Carolina > Wake County > Cary (0.04)
North America > United States > New York > New York County > New York City (0.04)

Industry: Health & Medicine > Therapeutic Area (0.97)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

Vivarelli, Francesco, Williams, Christopher K. I.

Discovering Hidden Features with Gaussian Processes Regression

W is often taken to be diagonal, but if we allow W to be a general positive definite matrix which can be tuned on the basis of training data, then an eigen-analysis of W shows that we are effectively creating hidden features, where the dimensionality of the hidden-feature space is determined by the data. We demonstrate the superiority of predictions using the general matrix over those based on a diagonal matrix on two test problems.

covariance function, distance matrix, matrix, (15 more...)

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Italy (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.47)

Vasconcelos, Nuno, Lippman, Andrew

Learning Mixture Hierarchies

The hierarchical representation of data has various applications in domains such as data mining, machine vision, or information retrieval. In this paper we introduce an extension of the Expectation-Maximization (EM) algorithm that learns mixture hierarchies in a computationally efficient manner. Efficiency is achieved by progressing in a bottom-up fashion, i.e. by clustering the mixture components of a given level in the hierarchy to obtain those of the level above. This cl ustering requires onl y knowledge of the mixture parameters, there being no need to resort to intermediate samples. In addition to practical applications, the algorithm allows a new interpretation of EM that makes clear the relationship with nonparametric kernel-based estimation methods, provides explicit control over the tradeoff between the bias and variance of EM estimates, and offers new insights about the behavior of deterministic annealing methods commonly used with EM to escape local minima of the likelihood.

algorithm, hierarchy, mixture component, (14 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > California > Santa Barbara County > Santa Barbara (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.35)

Ueda, Naonori, Nakano, Ryohei, Ghahramani, Zoubin, Hinton, Geoffrey E.

SMEM Algorithm for Mixture Models

We present a split and merge EM (SMEM) algorithm to overcome the local maximum problem in parameter estimation of finite mixture models. In the case of mixture models, non-global maxima often involve having too many components of a mixture model in one part of the space and too few in another, widely separated part of the space. To escape from such configurations we repeatedly perform simultaneous split and merge operations using a new criterion for efficiently selecting the split and merge candidates. We apply the proposed algorithm to the training of Gaussian mixtures and mixtures of factor analyzers using synthetic and real data and show the effectiveness of using the split and merge operations to improve the likelihood of both the training data and of held-out test data. 1 INTRODUCTION Mixture density models, in particular normal mixtures, have been extensively used in the field of statistical pattern recognition [1]. Recently, more sophisticated mixture density models such as mixtures of latent variable models (e.g., probabilistic PCA or factor analysis) have been proposed to approximate the underlying data manifold [2]-[4].

algorithm, smem algorithm, split and merge operation, (12 more...)

Country:

North America > Canada > Ontario > Toronto (0.04)
Europe > United Kingdom > England > West Midlands > Birmingham (0.04)
Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Probabilistic Visualisation of High-Dimensional Binary Data

Tipping, Michael E.

We present a probabilistic latent-variable framework for data visualisation, a key feature of which is its applicability to binary and categorical data types for which few established methods exist. A variational approximation to the likelihood is exploited to derive a fast algorithm for determining the model parameters. Illustrations of application to real and synthetic binary data sets are given.

approximation, variational approximation, visualisation, (15 more...)

Country:

Asia > Middle East > Jordan (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)

Simard, Patrice, Bottou, Léon, Haffner, Patrick, LeCun, Yann

Boxlets: A Fast Convolution Algorithm for Signal Processing and Neural Networks

Feature extraction is a typical example: The distance between a small pattern (i.e.

algorithm, convolution, impulse function, (13 more...)

Country:

North America > United States > Washington > King County > Redmond (0.04)
North America > United States > Texas > Dallas County > Dallas (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.66)

Rätsch, Gunnar, Onoda, Takashi, Müller, Klaus R.

Regularizing AdaBoost

We will also introduce a regularization strategy (analogous to weight decay) into boosting. This strategy uses slack variables to achieve a soft margin (section 4). Numerical experiments show the validity of our regularization approach in section 5 and finally a brief conclusion is given. 2 AdaBoost Algorithm Let {ht(x): t 1,...,T} be an ensemble of T hypotheses defined on input vector x and e

adaboost, algorithm, hypothesis, (17 more...)

Country:

Europe > Slovenia > Central Slovenia > Municipality of Ljubljana > Ljubljana (0.04)
Europe > Germany > Berlin (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Industry: Health & Medicine (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.32)

Using Analytic QP and Sparseness to Speed Training of Support Vector Machines

Platt, John C.

SVMs have empirically been shown to give good generalization performance on a wide variety of problems. However, the use of SVMs is stilI limited to a small group of researchers. One possible reason is that training algorithms for SVMs are slow, especially for large problems. Another explanation is that SVM training algorithms are complex, subtle, and sometimes difficult to implement. This paper describes a new SVM learning algorithm that is easy to implement, often faster, and has better scaling properties than the standard SVM training algorithm. The new SVM learning algorithm is called Sequential Minimal Optimization (or SMO).

algorithm, kkt condition, svm, (14 more...)

Country:

North America > United States > Washington > King County > Redmond (0.04)
North America > United States > California > Orange County > Irvine (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Replicator Equations, Maximal Cliques, and Graph Isomorphism

Pelillo, Marcello

We present a new energy-minimization framework for the graph isomorphism problem which is based on an equivalent maximum clique formulation. The approach is centered around a fundamental result proved by Motzkin and Straus in the mid-1960s, and recently expanded in various ways, which allows us to formulate the maximum clique problem in terms of a standard quadratic program. To solve the program we use "replicator" equations, a class of simple continuous-and discrete-time dynamical systems developed in various branches of theoretical biology. We show how, despite their inability to escape from local solutions, they nevertheless provide experimental results which are competitive with those obtained using more elaborate mean-field annealing heuristics. 1 INTRODUCTION The graph isomorphism problem is one of those few combinatorial optimization problems which still resist any computational complexity characterization [6]. Despite decades of active research, no polynomial-time algorithm for it has yet been found.

clique, graph, isomorphism, (11 more...)

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > Italy (0.05)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.96)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)