AITopics

The speech waveform can be modelled as a piecewise-stationary linear stochastic state space system, and its parameters can be estimated using an expectation-maximisation (EM) algorithm. One problem is the initialisation ofthe EM algorithm. Standard initialisation schemes can lead to poor formant trajectories. But these trajectories however are important forvowel intelligibility. The aim of this paper is to investigate the suitability of subspace identification methods to initialise EM. The paper compares the subspace state space system identification (4SID) method with the EM algorithm. The 4SID and EM methods are similar in that they both estimate a state sequence (but using Kalman filters andKalman smoothers respectively), and then estimate parameters (but using least-squares and maximum likelihood respectively).

algorithm, artificial intelligence, machine learning, (14 more...)

Country:

North America > United States (0.29)
Europe > United Kingdom > England (0.15)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.79)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.78)

Schraudolph, Nicol N., Giannakopoulos, Xavier

Online Independent Component Analysis with Local Learning Rate Adaptation

Stochastic meta-descent (SMD) is a new technique for online adaptation oflocal learning rates in arbitrary twice-differentiable systems. Like matrix momentum it uses full second-order information while retaining O(n) computational complexity by exploiting the efficient computation of Hessian-vector products. Here we apply SMD to independent component analysis, and employ the resulting algorithmfor the blind separation of time-varying mixtures. By matching individual learning rates to the rate of change in each source signal's mixture coefficients, our technique is capable of simultaneously trackingsources that move at very different, a priori unknown speeds. 1 Introduction Independent component analysis (ICA) methods are typically run in batch mode in order to keep the stochasticity of the empirical gradient low. Often this is combined with a global learning rate annealing scheme that negotiates the tradeoff between fast convergence and good asymptotic performance.

algorithm, artificial intelligence, machine learning, (12 more...)

Country:

Europe (0.94)
North America > United States > California > San Francisco County > San Francisco (0.14)

Industry:

Government (0.69)
Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.31)

Williams, Christopher K. I.

A MCMC Approach to Hierarchical Mixture Modelling

There are many hierarchical clustering algorithms available, but these lack a firm statistical basis. Here we set up a hierarchical probabilistic mixture model, where data is generated in a hierarchical tree-structured manner. Markov chain Monte Carlo (MCMC) methods are demonstrated which can be used to sample from the posterior distribution over trees containing variable numbers of hidden units.

artificial intelligence, configuration, machine learning, (19 more...)

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.15)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.70)

Vapnik, Vladimir, Mukherjee, Sayan

Support Vector Method for Multivariate Density Estimation

A new method for multivariate density estimation is developed based on the Support Vector Method (SVM) solution of inverse ill-posed problems. The solution has the form of a mixture of densities. Thismethod with Gaussian kernels compared favorably to both Parzen's method and the Gaussian Mixture Model method. For synthetic data we achieve more accurate estimates for densities of 2, 6, 12, and 40 dimensions. 1 Introduction The problem of multivariate density estimation is important for many applications, in particular, for speech recognition [1] [7]. When the unknown density belongs to a parametric set satisfying certain conditions one can estimate it using the maximum likelihood (ML) method. Often these conditions are too restrictive. Therefore, nonparametric methods were proposed. The most popular of these, Parzen's method [5], uses the following estimate given data

artificial intelligence, machine learning, svm method, (15 more...)

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.63)

The Relevance Vector Machine

Tipping, Michael E.

The support vector machine (SVM) is a state-of-the-art technique for regression and classification, combining excellent generalisation properties with a sparse kernel representation. However, it does suffer from a number of disadvantages, notably the absence of probabilistic outputs,the requirement to estimate a tradeoff parameter and the need to utilise'Mercer' kernel functions. In this paper we introduce the Relevance Vector Machine (RVM), a Bayesian treatment ofa generalised linear model of identical functional form to the SVM. The RVM suffers from none of the above disadvantages, and examples demonstrate that for comparable generalisation performance, theRVM requires dramatically fewer kernel functions.

artificial intelligence, kernel function, machine learning, (16 more...)

Country: North America > United States (0.15)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.91)

On Input Selection with Reversible Jump Markov Chain Monte Carlo Sampling

Sykacek, Peter

In this paper we will treat input selection for a radial basis function (RBF) like classifier within a Bayesian framework. We approximate the a-posteriori distribution over both model coefficients and input subsets by samples drawn with Gibbs updates and reversible jump moves. Using some public datasets, we compare the classification accuracy of the method with a conventional ARD scheme. These datasets are also used to infer the a-posteriori probabilities of different inputsubsets.

artificial intelligence, bayesian inference, machine learning, (17 more...)

Country:

North America > United States > California (0.14)
Europe > Austria > Vienna (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.43)

Slonim, Noam, Tishby, Naftali

Agglomerative Information Bottleneck

This question was recently shown in [9] to be a special case of a much more fundamental problem:What are the features of the variable X that are relevant for the prediction of another, relevance, variable Y?

artificial intelligence, information, machine learning, (16 more...)

Country:

North America > United States > Ohio (0.14)
Asia > Middle East > Israel (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Leveraged Vector Machines

Singer, Yoram

We describe an iterative algorithm for building vector machines used in classification tasks. The algorithm builds on ideas from support vector machines, boosting, and generalized additive models. The algorithm can be used with various continuously differential functions that bound the discrete (0-1) classification loss and is very simple to implement. We test the proposed algorithm with two different loss functions on synthetic and natural data. We also describe a norm-penalized version of the algorithm for the exponential loss function used in AdaBoost.

algorithm, artificial intelligence, machine learning, (17 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.55)

Bayesian Model Selection for Support Vector Machines, Gaussian Processes and Other Kernel Classifiers

Seeger, Matthias

We present a variational Bayesian method for model selection over families of kernels classifiers like Support Vector machines or Gaussian processes.The algorithm needs no user interaction and is able to adapt a large number of kernel parameters to given data without having to sacrifice training cases for validation. This opens the possibility touse sophisticated families of kernels in situations where the small "standard kernel" classes are clearly inappropriate. We relate the method to other work done on Gaussian processes and clarify the relation between Support Vector machines and certain Gaussian process models. 1 Introduction Bayesian techniques have been widely and successfully used in the neural networks and statistics community and are appealing because of their conceptual simplicity, generality and consistency with which they solve learning problems. In this paper we present a new method for applying the Bayesian methodology to Support Vector machines. We will briefly review Gaussian Process and Support Vector classification in this section and clarify their relationship by pointing out the common roots. Although we focus on classification here, it is straightforward to apply the methods to regression problems as well. In section 2 we introduce our algorithm and show relations to existing methods. Finally, we present experimental results in section 3 and close with a discussion in section 4. Let X be a measure space (e.g.

artificial intelligence, machine learning, support vector machine, (13 more...)

Country: Europe (0.46)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.72)

Rätsch, Gunnar, Schölkopf, Bernhard, Smola, Alex J., Müller, Klaus-Robert, Onoda, Takashi, Mika, Sebastian

v-Arc: Ensemble Learning in the Presence of Outliers

The idea of a large minimum margin [17] explains the good generalization performance ofAdaBoost in the low noise regime. However, AdaBoost performs worse on noisy tasks [10, 11], such as the iris and the breast cancer benchmark data sets [1]. On the latter tasks, a large margin on all training points cannot be achieved without adverse effects on the generalization error. This experimental observation was supported by the study of [13] where the generalization error of ensemble methods wasbounded by the sum of the fraction of training points which have a margin smaller than some value p, say, plus a complexity term depending on the base hypotheses andp. While this bound can only capture part of what is going on in practice, it nevertheless already conveys the message that in some cases it pays to allow for some points which have a small margin, or are misclassified, if this leads to a larger overall margin on the remaining points. To cope with this problem, it was mandatory to construct regularized variants of AdaBoost, which traded off the number of margin errors and the size of the margin 562 G.Riitsch, B. Sch6lkopf, A. J. Smola, K.-R.

algorithm, artificial intelligence, machine learning, (18 more...)

Country:

North America > United States (0.28)
Oceania > Australia (0.28)
Europe (0.28)
Asia > Japan (0.28)

Industry:

Health & Medicine > Therapeutic Area (0.54)
Health & Medicine > Pharmaceuticals & Biotechnology (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)