Goto

Collaborating Authors

 Country


Incorporating Second-Order Functional Knowledge for Better Option Pricing

Neural Information Processing Systems

Incorporating prior knowledge of a particular task into the architecture of a learning algorithm can greatly improve generalization performance. We study here a case where we know that the function to be learned is non-decreasing in two of its arguments and convex in one of them. For this purpose we propose a class of functions similar to multi-layer neural networks but (1) that has those properties, (2) is a universal approximator of continuous functions with these and other properties. We apply this new class of functions to the task of modeling the price of call options. Experiments show improvements on regressing the price of call options using the new types of function classes that incorporate the a priori constraints. 1 Introduction Incorporating a priori knowledge of a particular task into a learning algorithm helps reducing thenecessary complexity of the learner and generally improves performance, if the incorporated knowledge is relevant to the task and really corresponds to the generating process ofthe data. In this paper we consider prior knowledge on the positivity of some first and second derivatives of the function to be learned. In particular such constraints have applications to modeling the price of European stock options. Based on the Black-Scholes formula, the price of a call stock option is monotonically increasing in both the "moneyness" andtime to maturity of the option, and it is convex in the "moneyness". Section 3 better explains these terms and stock options.


High-temperature Expansions for Learning Models of Nonnegative Data

Neural Information Processing Systems

Recent work has exploited boundedness of data in the unsupervised learning of new types of generative model. For nonnegative data it was recently shown that the maximum-entropy generative model is a Nonnegative BoltzmannDistribution not a Gaussian distribution, when the model is constrained to match the first and second order statistics of the data. Learning for practical sized problems is made difficult by the need to compute expectations under the model distribution. The computational costof Markov chain Monte Carlo methods and low fidelity of naive mean field techniques has led to increasing interest in advanced mean field theories and variational methods. Here I present a secondorder mean-fieldapproximation for the Nonnegative Boltzmann Machine model, obtained using a "high-temperature" expansion. The theory is tested on learning a bimodal 2-dimensional model, a high-dimensional translationally invariant distribution, and a generative model for handwritten digits.


An Adaptive Metric Machine for Pattern Classification

Neural Information Processing Systems

Nearest neighbor classification assumes locally constant class conditional probabilities.This assumption becomes invalid in high dimensions with finite samples due to the curse of dimensionality. Severe bias can be introduced under these conditions when using the nearest neighbor rule. We propose a locally adaptive nearest neighbor classification method to try to minimize bias. We use a Chi-squared distance analysis to compute a flexible metric for producing neighborhoodsthat are elongated along less relevant feature dimensions and constricted along most influential ones. As a result, the class conditional probabilities tend to be smoother in the modified neighborhoods,whereby better classification performance can be achieved. The efficacy of our method is validated and compared against other techniques using a variety of real world data. 1 Introduction


Vicinal Risk Minimization

Neural Information Processing Systems

The Vicinal Risk Minimization principle establishes a bridge between generative models and methods derived from the Structural Risk Minimization Principlesuch as Support Vector Machines or Statistical Regularization. Weexplain how VRM provides a framework which integrates a number of existing algorithms, such as Parzen windows, Support Vector Machines, Ridge Regression, Constrained Logistic Classifiers and Tangent-Prop. We then show how the approach implies new algorithms forsolving problems usually associated with generative models. New algorithms are described for dealing with pattern recognition problems with very different pattern distributions and dealing with unlabeled data. Preliminary empirical results are presented.


Incremental and Decremental Support Vector Machine Learning

Neural Information Processing Systems

An online recursive algorithm for training support vector machines, one vector at a time, is presented. Adiabatic increments retain the Kuhn Tucker conditions on all previously seen training data, in a number of steps each computed analytically. The incremental procedure is reversible, anddecremental "unlearning" offers an efficient method to exactly evaluate leave-one-out generalization performance.


A Linear Programming Approach to Novelty Detection

Neural Information Processing Systems

Novelty detection involves modeling the normal behaviour of a system henceenabling detection of any divergence from normality. It has potential applications in many areas such as detection of machine damageor highlighting abnormal features in medical data. One approach is to build a hypothesis estimating the support of the normal data i.e. constructing a function which is positive in the region where the data is located and negative elsewhere. Recently kernel methods have been proposed for estimating the support of a distribution and they have performed well in practice - training involves solution of a quadratic programming problem. In this paper wepropose a simpler kernel method for estimating the support based on linear programming. The method is easy to implement and can learn large datasets rapidly. We demonstrate the method on medical and fault detection datasets.


Direct Classification with Indirect Data

Neural Information Processing Systems

Suppose there exists an unknown real-valued property of the feature space, p(¢), that maps from the feature space, ¢ ERn, to R. The property function and a positive set A c


A Variational Mean-Field Theory for Sigmoidal Belief Networks

Neural Information Processing Systems

In this paper we will discuss a variational mean-field theory and its application to BNs, sigmoidal BNs in particular. We present a variational derivation of the mean-field theory, proposed by Plefka[2].


Convergence of Large Margin Separable Linear Classification

Neural Information Processing Systems

Large margin linear classification methods have been successfully applied tomany applications. For a linearly separable problem, it is known that under appropriate assumptions, the expected misclassification error of the computed "optimal hyperplane" approaches zero at a rate proportional tothe inverse training sample size. This rate is usually characterized bythe margin and the maximum norm of the input data. In this paper, we argue that another quantity, namely the robustness of the input datadistribution, also plays an important role in characterizing the convergence behavior of expected misclassification error. Based on this concept of robustness, we show that for a large margin separable linear classification problem, the expected misclassification error may converge exponentially in the number of training sample size. 1 Introduction We consider the binary classification problem: to determine a label y E {-1, 1} associated withan input vector x. A useful method for solving this problem is by using linear discriminant functions .


Learning Winner-take-all Competition Between Groups of Neurons in Lateral Inhibitory Networks

Neural Information Processing Systems

It has long been known that lateral inhibition in neural networks can lead to a winner-take-all competition, so that only a single neuron is active at a steady state. Here we show how to organize lateral inhibition so that groups of neurons compete to be active. Given a collection of potentially overlappinggroups, the inhibitory connectivity is set by a formula that can be interpreted as arising from a simple learning rule. Our analysis demonstratesthat such inhibition generally results in winner-take-all competition between the given groups, with the exception of some degenerate cases.In a broader context, the network serves as a particular illustration of the general distinction between permitted and forbidden sets, which was introduced recently. From this viewpoint, the computational functionof our network is to store and retrieve memories as permitted setsof coactive neurons.