Goto

Collaborating Authors

 Statistical Learning


Stationarity and Stability of Autoregressive Neural Network Processes

Neural Information Processing Systems

AR-NNs are a natural generalization of the classic linear autoregressive AR(p) process (2) See, e.g., Brockwell & Davis (1987) for a comprehensive introduction into AR and ARMA (autoregressive moving average) models.


Optimizing Classifers for Imbalanced Training Sets

Neural Information Processing Systems

Following recent results [9, 8] showing the importance of the fatshattering dimension in explaining the beneficial effect of a large margin on generalization performance, the current paper investigates the implications of these results for the case of imbalanced datasets and develops two approaches to setting the threshold. The approaches are incorporated into ThetaBoost, a boosting algorithm for dealing with unequal loss functions. The performance of ThetaBoost and the two approaches are tested experimentally.


Unsupervised and Supervised Clustering: The Mutual Information between Parameters and Observations

Neural Information Processing Systems

Recent works in parameter estimation and neural coding have demonstrated that optimal performance are related to the mutual information between parameters and data. We consider the mutual information in the case where the dependency in the parameter (a vector 8) of the conditional p.d.f. of each observation (a vector


Linear Hinge Loss and Average Margin

Neural Information Processing Systems

We describe a unifying method for proving relative loss bounds for online linear threshold classification algorithms, such as the Perceptron and the Winnow algorithms. For classification problems the discrete loss is used, i.e., the total number of prediction mistakes. We introduce a continuous loss function, called the "linear hinge loss", that can be employed to derive the updates of the algorithms. We first prove bounds w.r.t. the linear hinge loss and then convert them to the discrete loss. We introduce a notion of "average margin" of a set of examples. We show how relative loss bounds based on the linear hinge loss can be converted to relative loss bounds i.t.o. the discrete loss using the average margin.


Finite-Dimensional Approximation of Gaussian Processes

Neural Information Processing Systems

Gaussian process (GP) prediction suffers from O(n3) scaling with the data set size n. By using a finite-dimensional basis to approximate the GP predictor, the computational complexity can be reduced. We derive optimal finite-dimensional predictors under a number of assumptions, and show the superiority of these predictors over the Projected Bayes Regression method (which is asymptotically optimal). We also show how to calculate the minimal model size for a given n. The calculations are backed up by numerical experiments.


Dynamically Adapting Kernels in Support Vector Machines

Neural Information Processing Systems

The kernel-parameter is one of the few tunable parameters in Support Vector machines, controlling the complexity of the resulting hypothesis. Its choice amounts to model selection and its value is usually found by means of a validation set. We present an algorithm which can automatically perform model selection with little additional computational cost and with no need of a validation set. In this procedure model selection and learning are not separate, but kernels are dynamically adjusted during the learning process to find the kernel parameter which provides the best possible upper bound on the generalisation error. Theoretical results motivating the approach and experimental results confirming its validity are presented.


Stationarity and Stability of Autoregressive Neural Network Processes

Neural Information Processing Systems

AR-NNs are a natural generalization of the classic linear autoregressive AR(p) process (2) See, e.g., Brockwell & Davis (1987) for a comprehensive introduction into AR and ARMA (autoregressive moving average) models. F. Leisch, A. Trapletti and K. Hornik 268 One of the most central questions in linear time series theory is the stationarity of the model, i.e., whether the probabilistic structure of the series is constant over time or at least asymptotically constant (when not started in equilibrium). Surprisingly, this question has not gained much interest in the NN literature, especially there are-up to our knowledge-no results giving conditions for the stationarity of AR NN models. There are results on the stationarity of Hopfield nets (Wang & Sheng, 1996), but these nets cannot be used to estimate conditional expectations for time series prediction. The rest of this paper is organized as follows: In Section 2 we recall some results from time series analysis and Markov chain theory defining the relationship between a time series and its associated Markov chain. In Section 3 we use these results to establish that standard AR-NN models without shortcut connections are stationary. We also give conditions for AR-NN models with shortcut connections to be stationary. Section 4 examines the NN modeling of an important class of non-stationary to the appendix.time



Semiparametric Support Vector and Linear Programming Machines

Neural Information Processing Systems

In fact, for many of the kernels used (not the polynomial kernels) like Gaussian rbf-kernels it can be shown [6] that SV machines are universal approximators. While this is advantageous in general, parametric models are useful techniques in their own right. Especially if one happens to have additional knowledge about the problem, it would be unwise not to take advantage of it. For instance it might be the case that the major properties of the data are described by a combination of a small set of linear independent basis functions {¢Jt (.), ..., ¢n (.)}. Or one may want to correct the data for some (e.g.


A Polygonal Line Algorithm for Constructing Principal Curves

Neural Information Processing Systems

Principal curves have been defined as "self consistent" smooth curves which pass through the "middle" of a d-dimensional probability distribution ordata cloud. Recently, we [1] have offered a new approach by defining principal curves as continuous curves of a given length which minimize the expected squared distance between the curve and points of the space randomly chosen according to a given distribution. The new definition made it possible to carry out a theoretical analysis of learning principal curves from training data. In this paper we propose a practical construction based on the new definition. Simulation results demonstrate that the new algorithm compares favorably with previous methods both in terms of performance and computational complexity.