Goto

Collaborating Authors

 Regression


Memory-based Stochastic Optimization

Neural Information Processing Systems

In this paper we introduce new algorithms for optimizing noisy plants in which each experiment is very expensive. The algorithms build a global non-linear model of the expected output at the same time as using Bayesian linear regression analysis of locally weighted polynomial models. The local model answers queries about confi(cid:173) dence, noise, gradient and Hessians, and use them to make auto(cid:173) mated decisions similar to those made by a practitioner of Response Surface Methodology. The global and local models are combined naturally as a locally weighted regression. We examine the ques(cid:173) tion of whether the global model can really help optimization, and we extend it to the case of time-varying functions.


Support Vector Regression Machines

Neural Information Processing Systems

A new regression technique based on Vapnik's concept of support vectors is introduced. We compare support vector regression (SVR) with a committee regression technique (bagging) based on regression trees and ridge regression done in feature space. On the basis of these experiments, it is expected that SVR will have advantages in high dimensionality space because SVR optimization does not depend on the dimensionality of the input space.


Competitive On-line Linear Regression

Neural Information Processing Systems

We apply a general algorithm for merging prediction strategies (the Aggregating Algorithm) to the problem of linear regression with the square loss; our main assumption is that the response variable is bounded. It turns out that for this particular problem the Aggre(cid:173) gating Algorithm resembles, but is slightly different from, the well(cid:173) known ridge estimation procedure. From general results about the Aggregating Algorithm we deduce a guaranteed bound on the dif(cid:173) ference between our algorithm's performance and the best, in some sense, linear regression function's performance. We show that the AA attains the optimal constant in our bound, whereas the con(cid:173) stant attained by the ridge regression procedure in general can be 4 times worse.


Shrinking the Tube: A New Support Vector Regression Algorithm

Neural Information Processing Systems

A new algorithm for Support Vector regression is described. For a priori chosen 1/, it automatically adjusts a flexible tube of minimal radius to the data such that at most a fraction 1/ of the data points lie outside. More(cid:173) over, it is shown how to use parametric tube shapes with non-constant radius. The algorithm is analysed theoretically and experimentally.


Optimal Kernel Shapes for Local Linear Regression

Neural Information Processing Systems

Local linear regression performs very well in many low-dimensional forecasting problems. In high-dimensional spaces, its performance typically decays due to the well-known "curse-of-dimensionality". A possible way to approach this problem is by varying the "shape" of the weighting kernel. In this work we suggest a new, data-driven method to estimating the optimal kernel shape. Experiments us(cid:173) ing an artificially generated data set and data from the UC Irvine repository show the benefits of kernel shaping.


A Comparison of Image Processing Techniques for Visual Speech Recognition Applications

Neural Information Processing Systems

We examine eight different techniques for developing visual rep(cid:173) resentations in machine vision tasks. In particular we compare different versions of principal component and independent com(cid:173) ponent analysis in combination with stepwise regression methods for variable selection. We found that local methods, based on the statistics of image patches, consistently outperformed global meth(cid:173) ods based on the statistics of entire images. This result is consistent with previous work on emotion and facial expression recognition. In addition, the use of a stepwise regression technique for selecting variables and regions of interest substantially boosted performance.


On the Convergence of Leveraging

Neural Information Processing Systems

We show convergence rates of ensemble learning methods such as AdaBoost [10], Logistic Regression (LR) [11, 5] and the Least-Square (LS) regression algorithm called LS-Boost [12].


On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes

Neural Information Processing Systems

We compare discriminative and generative learning as typified by logistic regression and naive Bayes. We show, contrary to a widely(cid:173) held belief that discriminative classifiers are almost always to be preferred, that there can often be two distinct regimes of per(cid:173) formance as the training set size is increased, one in which each algorithm does better. This stems from the observation- which is borne out in repeated experiments- that while discriminative learning has lower asymptotic error, a generative classifier may also approach its (higher) asymptotic error much faster.


Kernel Logistic Regression and the Import Vector Machine

Neural Information Processing Systems

The support vector machine (SVM) is known for its good performance in binary classification, but its extension to multi-class classification is still an on-going research issue. In this paper, we propose a new approach for classification, called the import vector machine (IVM), which is built on kernel logistic regression (KLR). We show that the IVM not only per- forms as well as the SVM in binary classification, but also can naturally be generalized to the multi-class case. Furthermore, the IVM provides an estimate of the underlying probability. Similar to the "support points" of the SVM, the IVM model uses only a fraction of the training data to index kernel basis functions, typically a much smaller fraction than the SVM.


A Formulation for Minimax Probability Machine Regression

Neural Information Processing Systems

We formulate the regression problem as one of maximizing the mini- mum probability, symbolized by (cid:10), that future predicted outputs of the regression model will be within some (cid:6)" bound of the true regression function. Our formulation is unique in that we obtain a direct estimate of this lower probability bound (cid:10). The proposed framework, minimax probability machine regression (MPMR), is based on the recently de- scribed minimax probability machine classification algorithm [Lanckriet et al.] and uses Mercer Kernels to obtain nonlinear regression models. MPMR is tested on both toy and real world data, verifying the accuracy of the (cid:10) bound, and the efficacy of the regression models.