Technology
Semiparametric Support Vector and Linear Programming Machines
Smola, Alex J., Frieß, Thilo-Thomas, Schölkopf, Bernhard
In fact, for many of the kernels used (not the polynomial kernels) like Gaussian rbf-kernels it can be shown [6] that SV machines are universal approximators. While this is advantageous in general, parametric models are useful techniques in their own right. Especially if one happens to have additional knowledge about the problem, it would be unwise not to take advantage of it. For instance it might be the case that the major properties of the data are described by a combination of a small set of linear independent basis functions {¢Jt (.),..., ¢n (.)}. Or one may want to correct the data for some (e.g.
Discovering Hidden Features with Gaussian Processes Regression
Vivarelli, Francesco, Williams, Christopher K. I.
W is often taken to be diagonal, but if we allow W to be a general positive definite matrix which can be tuned on the basis of training data, then an eigen-analysis of W shows that we are effectively creating hidden features, where the dimensionality of the hidden-feature space is determined by the data. We demonstrate the superiority of predictions using the general matrix over those based on a diagonal matrix on two test problems.
Information Maximization in Single Neurons
Stemmler, Martin, Koch, Christof
Information from the senses must be compressed into the limited range of firing rates generated by spiking nerve cells. Optimal compression uses all firing rates equally often, implying that the nerve cell's response matches the statistics of naturally occurring stimuli. Since changing the voltage-dependent ionic conductances in the cell membrane alters the flow of information, an unsupervised, non-Hebbian, developmental learning rule is derived to adapt the conductances in Hodgkin-Huxley model neurons. By maximizing the rate of information transmission, each firing rate within the model neuron's limited dynamic range is used equally often. An efficient neuronal representation of incoming sensory information should take advantage of the regularity and scale invariance of stimulus features in the natural world. In the case of vision, this regularity is reflected in the typical probabilities of encountering particular visual contrasts, spatial orientations, or colors [1]. Given these probabilities, an optimized neural code would eliminate any redundancy, while devoting increased representation to commonly encountered features. At the level of a single spiking neuron, information about a potentially large range of stimuli is compressed into a finite range of firing rates, since the maximum firing rate of a neuron is limited. Optimizing the information transmission through a single neuron in the presence of uniform, additive noise has an intuitive interpretation: the most efficient representation of the input uses every firing rate with equal probability.
Lazy Learning Meets the Recursive Least Squares Algorithm
Birattari, Mauro, Bontempi, Gianluca, Bersini, Hugues
Lazy learning is a memory-based technique that, once a query is received, extracts a prediction interpolating locally the neighboring examples of the query which are considered relevant according to a distance measure. In this paper we propose a data-driven method to select on a query-by-query basis the optimal number of neighbors to be considered for each prediction. As an efficient way to identify and validate local models, the recursive least squares algorithm is introduced in the context of local approximation and lazy learning. Furthermore, beside the winner-takes-all strategy for model selection, a local combination of the most promising models is explored. The method proposed is tested on six different datasets and compared with a state-of-the-art approach.
Optimizing Classifers for Imbalanced Training Sets
Karakoulas, Grigoris I., Shawe-Taylor, John
Following recent results [9, 8] showing the importance of the fatshattering dimension in explaining the beneficial effect of a large margin on generalization performance, the current paper investigates the implications of these results for the case of imbalanced datasets and develops two approaches to setting the threshold. The approaches are incorporated into ThetaBoost, a boosting algorithm for dealing with unequal loss functions. The performance of ThetaBoost and the two approaches are tested experimentally.
Global Optimisation of Neural Network Models via Sequential Sampling
Freitas, João F. G. de, Niranjan, Mahesan, Doucet, Arnaud, Gee, Andrew H.
We propose a novel strategy for training neural networks using sequential sampling-importance resampling algorithms. This global optimisation strategy allows us to learn the probability distribution of the network weights in a sequential framework. It is well suited to applications involving online, nonlinear, non-Gaussian or non-stationary signal processing. 1 INTRODUCTION This paper addresses sequential training of neural networks using powerful sampling techniques. Sequential techniques are important in many applications of neural networks involving real-time signal processing, where data arrival is inherently sequential. Furthermore, one might wish to adopt a sequential training strategy to deal with non-stationarity in signals, so that information from the recent past is lent more credence than information from the distant past. One way to sequentially estimate neural network models is to use a state space formulation and the extended Kalman filter (Singhal and Wu 1988, de Freitas, Niranjan and Gee 1998).
Call-Based Fraud Detection in Mobile Communication Networks Using a Hierarchical Regime-Switching Model
Hollmén, Jaakko, Tresp, Volker
Fraud causes substantial losses to telecommunication carriers. Detection systems which automatically detect illegal use of the network can be used to alleviate the problem. Previous approaches worked on features derived from the call patterns of individual users. In this paper we present a call-based detection system based on a hierarchical regime-switching model. The detection problem is formulated as an inference problem on the regime probabilities.
Graph Matching for Shape Retrieval
Huet, Benoit, Cross, Andrew D. J., Hancock, Edwin R.
We propose a new in-sample cross validation based method (randomized GACV) for choosing smoothing or bandwidth parameters that govern the bias-variance or fit-complexity tradeoff in'soft' classification. Soft classification refers to a learning procedure which estimates the probability that an example with a given attribute vector is in class 1 vs class O. The target for optimizing the the tradeoff is the Kullback-Liebler distance between the estimated probability distribution and the'true' probability distribution, representing knowledge of an infinite population. The method uses a randomized estimate of the trace of a Hessian and mimics cross validation at the cost of a single relearning with perturbed outcome data.