Statistical Learning
Estimating Car Insurance Premia: a Case Study in High-Dimensional Data Inference
Chapados, Nicolas, Bengio, Yoshua, Vincent, Pascal, Ghosn, Joumana, Dugas, Charles, Takeuchi, Ichiro, Meng, Linyan
This conditional expected claim amount is called the pure premium and it is the basis of the gross premium charged to the insured. This expected value is conditionned on information available about the insured and about the contract, which we call input profile here. This regression problem is difficult for several reasons: large number of examples, -large number variables (most of which are discrete and multi-valued), non-stationarity of the distribution, and a conditional distribution of the dependent variable which is very different from those usually encountered in typical applications.of
Learning Body Pose via Specialized Maps
Rosales, Rรณmer, Sclaroff, Stan
A nonlinear supervised learning model, the Specialized Mappings Architecture (SMA), is described and applied to the estimation of human body pose from monocular images. The SMA consists of several specialized forward mapping functions and an inverse mapping function. Each specialized function maps certain domains of the input space (image features) onto the output space (body pose parameters). The key algorithmic problems faced are those of learning the specialized domains and mapping functions in an optimal way, as well as performing inference given inputs and knowledge of the inverse function. Solutions to these problems employ the EM algorithm and alternating choices of conditional independence assumptions. Performance of the approach is evaluated with synthetic and real video sequences of human motion.
Grouping and dimensionality reduction by locally linear embedding
Polito, Marzia, Perona, Pietro
Locally Linear Embedding (LLE) is an elegant nonlinear dimensionality-reduction technique recently introduced by Roweis and Saul [2]. It fails when the data is divided into separate groups. We study a variant of LLE that can simultaneously group the data and calculate local embedding of each group. An estimate for the upper bound on the intrinsic dimension of the data set is obtained automatically. 1 Introduction Consider a collection of N data points Xi E ]RD.
Speech Recognition using SVMs
An important issue in applying SVMs to speech recognition is the ability to classify variable length sequences. This paper presents extensions to a standard scheme for handling this variable length data, the Fisher score. A more useful mapping is introduced based on the likelihood-ratio. The score-space defined by this mapping avoids some limitations of the Fisher score. Class-conditional generative models are directly incorporated into the definition of the score-space.
A Sequence Kernel and its Application to Speaker Recognition
A novel approach for comparing sequences of observations using an explicit-expansion kernel is demonstrated. The kernel is derived using the assumption of the independence of the sequence of observations and a mean-squared error training criterion. The use of an explicit expansion kernel reduces classifier model size and computation dramatically, resulting in model sizes and computation one-hundred times smaller in our application. The explicit expansion also preserves the computational advantages of an earlier architecture based on mean-squared error training. Training using standard support vector machine methodology gives accuracy that significantly exceeds the performance of state-of-the-art mean-squared error training for a speaker recognition task.
An Efficient Clustering Algorithm Using Stochastic Association Model and Its Implementation Using Nanostructures
Morie, Takashi, Matsuura, Tomohiro, Nagata, Makoto, Iwata, Atsushi
This paper describes a clustering algorithm for vector quantizers using a "stochastic association model". It offers a new simple and powerful softmax adaptation rule. The adaptation process is the same as the online K-means clustering method except for adding random fluctuation in the distortion error evaluation process. Simulation results demonstrate that the new algorithm can achieve efficient adaptation as high as the "neural gas" algorithm, which is reported as one of the most efficient clustering methods. It is a key to add uncorrelated random fluctuation in the similarity evaluation process for each reference vector. For hardware implementation of this process, we propose a nanostructure, whose operation is described by a single-electron circuit. It positively uses fluctuation in quantum mechanical tunneling processes.
Stochastic Mixed-Signal VLSI Architecture for High-Dimensional Kernel Machines
Genov, Roman, Cauwenberghs, Gert
A mixed-signal paradigm is presented for high-resolution parallel innerproduct computation in very high dimensions, suitable for efficient implementation of kernels in image processing. At the core of the externally digital architecture is a high-density, low-power analog array performing binary-binary partial matrix-vector multiplication. Full digital resolution is maintained even with low-resolution analog-to-digital conversion, owing to random statistics in the analog summation of binary products. A random modulation scheme produces near-Bernoulli statistics even for highly correlated inputs. The approach is validated with real image data, and with experimental results from a CID/DRAM analog array prototype in 0.5
Kernel Logistic Regression and the Import Vector Machine
The support vector machine (SVM) is known for its good performance in binary classification, but its extension to multi-class classification is still an ongoing research issue. In this paper, we propose a new approach for classification, called the import vector machine (IVM), which is built on kernel logistic regression (KLR). We show that the IVM not only performs as well as the SVM in binary classification, but also can naturally be generalized to the multi-class case. Furthermore, the IVM provides an estimate of the underlying probability. Similar to the "support points" of the SVM, the IVM model uses only a fraction of the training data to index kernel basis functions, typically a much smaller fraction than the SVM. This gives the IVM a computational advantage over the SVM, especially when the size of the training data set is large.
EM-DD: An Improved Multiple-Instance Learning Technique
In this model, each training example is a set (or bag) of instances along with a single label equal to the maximum label among all instances in the bag. The individual instances within the bag are not given labels. The goal is to learn to accurately predict the label of previously unseen bags. Standard supervised learning can be viewed as a special case of MI learning where each bag holds a single instance. The MI learning model was originally motivated by the drug activity prediction problem where each instance is a possible conformation (or shape) of a molecule and each bag contains all likely low-energy conformations for the molecule.
A General Greedy Approximation Algorithm with Applications
Greedy approximation algorithms have been frequently used to obtain sparse solutions to learning problems. In this paper, we present a general greedy algorithm for solving a class of convex optimization problems. We derive a bound on the rate of approximation for this algorithm, and show that our algorithm includes a number of earlier studies as special cases.