AITopics

Incorporating prior knowledge of a particular task into the architecture of a learning algorithm can greatly improve generalization performance. We study here a case where we know that the function to be learned is non-decreasing in two of its arguments and convex in one of them. For this purpose we propose a class of functions similar to multi-layer neural networks but (1) that has those properties, (2) is a universal approximator of continuous functions with these and other properties. We apply this new class of functions to the task of modeling the price of call options. Experiments show improvements on regressing the price of call options using the new types of function classes that incorporate the a priori constraints. 1 Introduction Incorporating a priori knowledge of a particular task into a learning algorithm helps reducing thenecessary complexity of the learner and generally improves performance, if the incorporated knowledge is relevant to the task and really corresponds to the generating process ofthe data. In this paper we consider prior knowledge on the positivity of some first and second derivatives of the function to be learned. In particular such constraints have applications to modeling the price of European stock options. Based on the Black-Scholes formula, the price of a call stock option is monotonically increasing in both the "moneyness" andtime to maturity of the option, and it is convex in the "moneyness". Section 3 better explains these terms and stock options.

artificial intelligence, machine learning, neural network, (11 more...)

Country: North America > Canada > Quebec (0.15)

Industry: Banking & Finance > Trading (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

High-temperature Expansions for Learning Models of Nonnegative Data

Downs, Oliver B.

Recent work has exploited boundedness of data in the unsupervised learning of new types of generative model. For nonnegative data it was recently shown that the maximum-entropy generative model is a Nonnegative BoltzmannDistribution not a Gaussian distribution, when the model is constrained to match the first and second order statistics of the data. Learning for practical sized problems is made difficult by the need to compute expectations under the model distribution. The computational costof Markov chain Monte Carlo methods and low fidelity of naive mean field techniques has led to increasing interest in advanced mean field theories and variational methods. Here I present a secondorder mean-fieldapproximation for the Nonnegative Boltzmann Machine model, obtained using a "high-temperature" expansion. The theory is tested on learning a bimodal 2-dimensional model, a high-dimensional translationally invariant distribution, and a generative model for handwritten digits.

approximation, artificial intelligence, machine learning, (15 more...)

Country:

North America > United States (0.14)
North America > Canada > Ontario > Toronto (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.58)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.56)

Domeniconi, Carlotta, Peng, Jing, Gunopulos, Dimitrios

An Adaptive Metric Machine for Pattern Classification

Nearest neighbor classification assumes locally constant class conditional probabilities.This assumption becomes invalid in high dimensions with finite samples due to the curse of dimensionality. Severe bias can be introduced under these conditions when using the nearest neighbor rule. We propose a locally adaptive nearest neighbor classification method to try to minimize bias. We use a Chi-squared distance analysis to compute a flexible metric for producing neighborhoodsthat are elongated along less relevant feature dimensions and constricted along most influential ones. As a result, the class conditional probabilities tend to be smoother in the modified neighborhoods,whereby better classification performance can be achieved. The efficacy of our method is validated and compared against other techniques using a variety of real world data. 1 Introduction

artificial intelligence, machine learning, neighborhood, (17 more...)

Country:

North America > United States > Oklahoma > Payne County > Stillwater (0.14)
North America > United States > California > Riverside County > Riverside (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Chapelle, Olivier, Weston, Jason, Bottou, Léon, Vapnik, Vladimir

Vicinal Risk Minimization

The Vicinal Risk Minimization principle establishes a bridge between generative models and methods derived from the Structural Risk Minimization Principlesuch as Support Vector Machines or Statistical Regularization. Weexplain how VRM provides a framework which integrates a number of existing algorithms, such as Parzen windows, Support Vector Machines, Ridge Regression, Constrained Logistic Classifiers and Tangent-Prop. We then show how the approach implies new algorithms forsolving problems usually associated with generative models. New algorithms are described for dealing with pattern recognition problems with very different pattern distributions and dealing with unlabeled data. Preliminary empirical results are presented.

algorithm, artificial intelligence, machine learning, (13 more...)

Country: North America > United States (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Cauwenberghs, Gert, Poggio, Tomaso

Incremental and Decremental Support Vector Machine Learning

An online recursive algorithm for training support vector machines, one vector at a time, is presented. Adiabatic increments retain the Kuhn Tucker conditions on all previously seen training data, in a number of steps each computed analytically. The incremental procedure is reversible, anddecremental "unlearning" offers an efficient method to exactly evaluate leave-one-out generalization performance.

artificial intelligence, machine learning, vector, (14 more...)

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.16)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Campbell, Colin, Bennett, Kristin P.

A Linear Programming Approach to Novelty Detection

Novelty detection involves modeling the normal behaviour of a system henceenabling detection of any divergence from normality. It has potential applications in many areas such as detection of machine damageor highlighting abnormal features in medical data. One approach is to build a hypothesis estimating the support of the normal data i.e. constructing a function which is positive in the region where the data is located and negative elsewhere. Recently kernel methods have been proposed for estimating the support of a distribution and they have performed well in practice - training involves solution of a quadratic programming problem. In this paper wepropose a simpler kernel method for estimating the support based on linear programming. The method is easy to implement and can learn large datasets rapidly. We demonstrate the method on medical and fault detection datasets.

data mining, detection, machine learning, (15 more...)

Country:

North America > United States > Wisconsin (0.14)
North America > United States > New York (0.14)

Industry: Health & Medicine (0.47)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.31)

Direct Classification with Indirect Data

Brown, Timothy X.

Suppose there exists an unknown real-valued property of the feature space, p(¢), that maps from the feature space, ¢ ERn, to R. The property function and a positive set A c

artificial intelligence, classifier, machine learning, (14 more...)

Country: North America > United States > Colorado (0.14)

Industry: Telecommunications (0.30)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.33)

Bhattacharyya, Chiranjib, Keerthi, S. Sathiya

A Variational Mean-Field Theory for Sigmoidal Belief Networks

In this paper we will discuss a variational mean-field theory and its application to BNs, sigmoidal BNs in particular. We present a variational derivation of the mean-field theory, proposed by Plefka[2].

approximation, artificial intelligence, machine learning, (16 more...)

Country: Asia > India (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.43)

Convergence of Large Margin Separable Linear Classification

Zhang, Tong

Large margin linear classification methods have been successfully applied tomany applications. For a linearly separable problem, it is known that under appropriate assumptions, the expected misclassification error of the computed "optimal hyperplane" approaches zero at a rate proportional tothe inverse training sample size. This rate is usually characterized bythe margin and the maximum norm of the input data. In this paper, we argue that another quantity, namely the robustness of the input datadistribution, also plays an important role in characterizing the convergence behavior of expected misclassification error. Based on this concept of robustness, we show that for a large margin separable linear classification problem, the expected misclassification error may converge exponentially in the number of training sample size. 1 Introduction We consider the binary classification problem: to determine a label y E {-1, 1} associated withan input vector x. A useful method for solving this problem is by using linear discriminant functions .

artificial intelligence, machine learning, misclassification error, (17 more...)

Country: North America > United States (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.47)

Xie, Xiaohui, Hahnloser, Richard H. R., Seung, H. Sebastian

Learning Winner-take-all Competition Between Groups of Neurons in Lateral Inhibitory Networks

It has long been known that lateral inhibition in neural networks can lead to a winner-take-all competition, so that only a single neuron is active at a steady state. Here we show how to organize lateral inhibition so that groups of neurons compete to be active. Given a collection of potentially overlappinggroups, the inhibitory connectivity is set by a formula that can be interpreted as arising from a simple learning rule. Our analysis demonstratesthat such inhibition generally results in winner-take-all competition between the given groups, with the exception of some degenerate cases.In a broader context, the network serves as a particular illustration of the general distinction between permitted and forbidden sets, which was introduced recently. From this viewpoint, the computational functionof our network is to store and retrieve memories as permitted setsof coactive neurons.

artificial intelligence, machine learning, neuron, (16 more...)

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.49)