AITopics

Two avenues will be explored: the first is to maximize the ()-average divergence between the class densities and the second is to minimize the union Bhattacharyya bound in the range of (). While both approaches yield similar performance in practice, they outperform standard LDA features and show a 10% relative improvement in the word error rate over state-of-the-art cepstral features on a large vocabulary telephony speech recognition task. 1 Introduction Modern speech recognition systems use cepstral features characterizing the short-term spectrum of the speech signal for classifying frames into phonetic classes. These features are augmented with dynamic information from the adjacent frames to capture transient spectral events in the signal.

bhattacharyya, divergence, objective function, (11 more...)

Country:

North America > United States > New York (0.06)
Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
Europe > Hungary > Budapest > Budapest (0.04)
Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.96)

Still, Susanne, Schölkopf, Bernhard, Hepp, Klaus, Douglas, Rodney J.

Four-legged Walking Gait Control Using a Neuromorphic Chip Interfaced to a Support Vector Learning Algorithm

To control the walking gaits of a four-legged robot we present a novel neuromorphic VLSI chip that coordinates the relative phasing of the robot's legs similar to how spinal Central Pattern Generators are believed to control vertebrate locomotion [3]. The chip controls the leg movements by driving motors with time varying voltages which are the outputs of a small network of coupled oscillators. The characteristics of the chip's output voltages depend on a set of input parameters. The relationship between input parameters and output voltages can be computed analytically for an idealized system. In practice, however, this ideal relationship is only approximately true due to transistor mismatch and offsets. Fine tuning of the chip's input parameters is done automatically by the robotic system, using an unsupervised Support Vector (SV) learning algorithm introduced recently [7]. The learning requires only that the description of the desired output is given. The machine learns from (unlabeled) examples how to set the parameters to the chip in order to obtain a desired motor behavior.

duty cycle, oscillator, robot, (15 more...)

Country:

Europe > Switzerland > Zürich > Zürich (0.15)
North America > United States > Washington > King County > Redmond (0.04)
North America > United States > New York (0.04)
(2 more...)

Industry:

Semiconductors & Electronics (0.69)
Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots > Locomotion (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.61)

Regularized Winnow Methods

Zhang, Tong

In theory, the Winnow multiplicative update has certain advantages over the Perceptron additive update when there are many irrelevant attributes. Recently, there has been much effort on enhancing the Perceptron algorithm by using regularization, leading to a class of linear classification methods called support vector machines. Similarly, it is also possible to apply the regularization idea to the Winnow algorithm, which gives methods we call regularized Winnows. We show that the resulting methods compare with the basic Winnows in a similar way that a support vector machine compares with the Perceptron. We investigate algorithmic issues and learning properties of the derived methods. Some experimental results will also be provided to illustrate different methods. 1 Introduction In this paper, we consider the binary classification problem that is to determine a label y E {-1, 1} associated with an input vector x. A useful method for solving this problem is through linear discriminant functions, which consist of linear combinations of the components of the input variable.

algorithm, hyperplane, winnow, (17 more...)

Country: North America > United States > New York (0.05)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (1.00)

Zemel, Richard S., Pitassi, Toniann

A Gradient-Based Boosting Algorithm for Regression Problems

Adaptive boosting methods are simple modular algorithms that operate as follows. Let 9: X -t Y be the function to be learned, where the label set Y is finite, typically binary-valued. The algorithm uses a learning procedure, which has access to n training examples, {(Xl, Y1),..., (xn, Yn)}, drawn randomly from X x Yaccording to distribution D; it outputs a hypothesis I:

algorithm, hypothesis, objective, (14 more...)

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.56)

Williams, Christopher K. I., Seeger, Matthias

Using the Nyström Method to Speed Up Kernel Machines

Abstract Missing

kernel machine, kkqr

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.40)

Williams, Christopher K. I.

On a Connection between Kernel PCA and Metric Multidimensional Scaling

In this paper we show that the kernel peA algorithm of Sch6lkopf et al (1998) can be interpreted as a form of metric multidimensional scaling (MDS) when the kernel function k(x, y) is isotropic, i.e. it depends only on Ilx - yll. This leads to a metric MDS algorithm where the desired configuration of points is found via the solution of an eigenproblem rather than through the iterative optimization of the stress objective function. The question of kernel choice is also discussed.

Country:

North America > United States > New York (0.05)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > California > Los Angeles County > Beverly Hills (0.04)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Weston, Jason, Mukherjee, Sayan, Chapelle, Olivier, Pontil, Massimiliano, Poggio, Tomaso, Vapnik, Vladimir

Feature Selection for SVMs

We introduce a method of feature selection for Support Vector Machines. The method is based upon finding those features which minimize bounds on the leave-one-out error. This search can be efficiently performed via gradient descent. The resulting algorithms are shown to be superior to some standard feature selection algorithms on both toy data and real-life problems of face recognition, pedestrian detection and analyzing DNA micro array data.

coefficient, selection, svm, (15 more...)

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(4 more...)

Industry: Health & Medicine > Therapeutic Area (0.31)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.74)

Mixtures of Gaussian Processes

Tresp, Volker

We introduce the mixture of Gaussian processes (MGP) model which is useful for applications in which the optimal bandwidth of a map is input dependent. The MGP is derived from the mixture of experts model and can also be used for modeling general conditional probability densities. We discuss how Gaussian processes -in particular in form of Gaussian process classification, the support vector machine and the MGP modelcan be used for quantifying the dependencies in graphical models. 1 Introduction Gaussian processes are typically used for regression where it is assumed that the underlying function is generated by one infinite-dimensional Gaussian distribution (i.e.

dependency, gaussian process, gpr model, (14 more...)

Country:

Asia > Middle East > Jordan (0.05)
Europe > Germany (0.04)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.57)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.36)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.36)

Tishby, Naftali, Slonim, Noam

Data Clustering by Markovian Relaxation and the Information Bottleneck Method

We introduce a new, nonparametric and principled, distance based clustering method. This method combines a pairwise based approach with a vector-quantization method which provide a meaningful interpretation to the resulting clusters. The idea is based on turning the distance matrix into a Markov process and then examine the decay of mutual-information during the relaxation of this process. The clusters emerge as quasi-stable structures during this relaxation, and then are extracted using the information bottleneck method.

algorithm, information, matrix, (16 more...)

Country:

North America > United States > Ohio > Franklin County > Columbus (0.04)
North America > United States > New York (0.04)
Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)

Industry: Health & Medicine (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)

Sparse Kernel Principal Component Analysis

Tipping, Michael E.

'Kernel' principal component analysis (PCA) is an elegant nonlinear generalisation of the popular linear data analysis method, where a kernel function implicitly defines a nonlinear transformation into a feature space wherein standard PCA is performed. Unfortunately, the technique is not'sparse', since the components thus obtained are expressed in terms of kernels associated with every training vector. This paper shows that by approximating the covariance matrix in feature space by a reduced number of example vectors, using a maximum-likelihood approach, we may obtain a highly sparse form of kernel PCA without loss of effectiveness. 1 Introduction Principal component analysis (PCA) is a well-established technique for dimensionality reduction, and examples of its many applications include data compression, image processing, visualisation, exploratory data analysis, pattern recognition and time series prediction.

feature space, kernel, projection, (13 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.82)