Goto

Collaborating Authors

 Country


Support Vector Method for Novelty Detection

Neural Information Processing Systems

Suppose you are given some dataset drawn from an underlying probability distributionP and you want to estimate a "simple" subset S of input space such that the probability that a test point drawn from P lies outside of S equals some a priori specified


Support Vector Method for Multivariate Density Estimation

Neural Information Processing Systems

A new method for multivariate density estimation is developed based on the Support Vector Method (SVM) solution of inverse ill-posed problems. The solution has the form of a mixture of densities. Thismethod with Gaussian kernels compared favorably to both Parzen's method and the Gaussian Mixture Model method. For synthetic data we achieve more accurate estimates for densities of 2, 6, 12, and 40 dimensions. 1 Introduction The problem of multivariate density estimation is important for many applications, in particular, for speech recognition [1] [7]. When the unknown density belongs to a parametric set satisfying certain conditions one can estimate it using the maximum likelihood (ML) method. Often these conditions are too restrictive. Therefore, nonparametric methods were proposed. The most popular of these, Parzen's method [5], uses the following estimate given data


Wiring Optimization in the Brain

Neural Information Processing Systems

The complexity of cortical circuits may be characterized by the number of synapses per neuron. We study the dependence of complexity on the fraction of the cortical volume that is made up of "wire" (that is, ofaxons and dendrites), and find that complexity is maximized when wire takes up about 60% of the cortical volume. This prediction is in good agreement withexperimental observations. A consequence of our arguments is that any rearrangement of neurons that takes more wire would sacrifice computational power.


v-Arc: Ensemble Learning in the Presence of Outliers

Neural Information Processing Systems

The idea of a large minimum margin [17] explains the good generalization performance ofAdaBoost in the low noise regime. However, AdaBoost performs worse on noisy tasks [10, 11], such as the iris and the breast cancer benchmark data sets [1]. On the latter tasks, a large margin on all training points cannot be achieved without adverse effects on the generalization error. This experimental observation was supported by the study of [13] where the generalization error of ensemble methods wasbounded by the sum of the fraction of training points which have a margin smaller than some value p, say, plus a complexity term depending on the base hypotheses andp. While this bound can only capture part of what is going on in practice, it nevertheless already conveys the message that in some cases it pays to allow for some points which have a small margin, or are misclassified, if this leads to a larger overall margin on the remaining points. To cope with this problem, it was mandatory to construct regularized variants of AdaBoost, which traded off the number of margin errors and the size of the margin 562 G.Riitsch, B. Sch6lkopf, A. J. Smola, K.-R.


Low Power Wireless Communication via Reinforcement Learning

Neural Information Processing Systems

This paper examines the application of reinforcement learning to a wireless communicationproblem. The problem requires that channel utility be maximized while simultaneously minimizing battery usage. We present a solution to this multi-criteria problem that is able to significantly reducepower consumption. The solution uses a variable discount factor to capture the effects of battery usage. 1 Introduction Reinforcement learning (RL) has been applied to resource allocation problems in telecommunications, e.g.,channel allocation in wireless systems, network routing, and admission control in telecommunication networks [1,2, 8, 10]. These have demonstrated reinforcement learningcan find good policies that significantly increase the application reward within the dynamics of the telecommunication problems.


Statistical Dynamics of Batch Learning

Neural Information Processing Systems

An important issue in neural computing concerns the description of learning dynamics with macroscopic dynamical variables. Recent progresson online learning only addresses the often unrealistic case of an infinite training set. We introduce a new framework to model batch learning of restricted sets of examples, widely applicable toany learning cost function, and fully taking into account the temporal correlations introduced by the recycling of the examples. For illustration we analyze the effects of weight decay and early stopping during the learning of teacher-generated examples.


Reconstruction of Sequential Data with Probabilistic Models and Continuity Constraints

Neural Information Processing Systems

We consider the problem of reconstructing a temporal discrete sequence of multidimensional real vectors when part of the data is missing, under the assumption that the sequence was generated by a continuous process. Aparticular case of this problem is multivariate regression, which is very difficult when the underlying mapping is one-to-many. We propose analgorithm based on a joint probability model of the variables of interest, implemented using a nonlinear latent variable model. Each point in the sequence is potentially reconstructed as any of the modes of the conditional distribution of the missing variables given the present variables (computed using an exhaustive mode search in a Gaussian mixture). Modeselection is determined by a dynamic programming search that minimises a geometric measure of the reconstructed sequence, derived fromcontinuity constraints. We illustrate the algorithm with a toy example and apply it to a real-world inverse problem, the acoustic-toarticulatory mapping.The results show that the algorithm outperforms conditional mean imputation and multilayer perceptrons. 1 Definition of the problem


Modeling High-Dimensional Discrete Data with Multi-Layer Neural Networks

Neural Information Processing Systems

The curse of dimensionality is severe when modeling high-dimensional discrete data: the number of possible combinations of the variables explodes exponentially.In this paper we propose a new architecture for modeling high-dimensional data that requires resources (parameters and computations) that grow only at most as the square of the number of variables, usinga multi-layer neural network to represent the joint distribution of the variables as the product of conditional distributions. The neural network can be interpreted as a graphical model without hidden random variables,but in which the conditional distributions are tied through the hidden units. The connectivity of the neural network can be pruned by using dependency tests between the variables. Experiments on modeling the distribution of several discrete data sets show statistically significant improvements over other methods such as naive Bayes and comparable Bayesian networks, and show that significant improvements can be obtained bypruning the network. 1 Introduction The curse of dimensionality hits particularly hard on models of high-dimensional discrete data because there are many more possible combinations of the values of the variables than can possibly be observed in any data set, even the large data sets now common in datamining applications.In this paper we are dealing in particular with multivariate discrete data, where one tries to build a model of the distribution of the data. This can be used for example to detect anomalous cases in data-mining applications, or it can be used to model the class-conditional distribution of some observed variables in order to build a classifier. A simple multinomial maximum likelihood model would give zero probability to all of the combinations not encountered in the training set, i.e., it would most likely give zero probability to most out-of-sample test cases. Smoothing the model by assigning the same nonzero probability for all the unobserved cases would not be satisfactory either because it would not provide much generalization from the training set. This could be obtained by using a multivariate multinomial model whose parameters Bare estimated by the maximum a-posteriori (MAP) principle, i.e., those that have the greatest probability, given the training data D, and using a diffuse prior PCB) (e.g.



Algebraic Analysis for Non-regular Learning Machines

Neural Information Processing Systems

Hierarchical learning machines are non-regular and non-identifiable statistical models, whose true parameter sets are analytic sets with singularities. Using algebraic analysis, we rigorously prove that the stochastic complexity of a non-identifiable learning machine is asymptotically equal to '1 log n - (ml - 1) log log n