Technology
Support Vector Regression Machines
Drucker, Harris, Burges, Christopher J. C., Kaufman, Linda, Smola, Alex J., Vapnik, Vladimir
A new regression technique based on Vapnik's concept of support vectors is introduced. We compare support vector regression (SVR) with a committee regression technique (bagging) based on regression trees and ridge regression done in feature space. On the basis of these experiments, it is expected that SVR will have advantages in high dimensionality space because SVR optimization does not depend on the dimensionality of the input space.
Analysis of Temporal-Diffference Learning with Function Approximation
Tsitsiklis, John N., Roy, Benjamin Van
The algorithm weanalyze performs online updating of a parameter vector during a single endless trajectory of an aperiodic irreducible finite state Markov chain. Results include convergence (with probability 1), a characterization of the limit of convergence, and a bound on the resulting approximation error. In addition to establishing new and stronger results than those previously available, our analysis is based on a new line of reasoning that provides new intuition about the dynamics of temporal-difference learning. Furthermore, we discuss the implications of two counterexamples with regards to the Significance of online updating and linearly parameterized function approximators. 1 INTRODUCTION The problem of predicting the expected long-term future cost (or reward) of a stochastic dynamic system manifests itself in both time-series prediction and control. Anexample in time-series prediction is that of estimating the net present value of a corporation, as a discounted sum of its future cash flows, based on the current state of its operations. In control, the ability to predict long-term future cost as a function of state enables the ranking of alternative states in order to guide decision-making. Indeed, such predictions constitute the cost-to-go function that is central to dynamic programming and optimal control (Bertsekas, 1995). Temporal-difference learning, originally proposed by Sutton (1988), is a method for approximating long-term future cost as a function of current state.
Multi-effect Decompositions for Financial Data Modeling
High frequency foreign exchange data can be decomposed into three components: the inventory effect component, the surprise infonnation (news) component and the regular infonnation component. The presence of the inventory effect and news can make analysis of trends due to the diffusion of infonnation (regular information component) difficult. We propose a neural-net-based, independent component analysis to separate highfrequency foreign exchange data into these three components. Our empirical results show that our proposed multi-effect decomposition can reveal the intrinsic price behavior.
Selective Integration: A Model for Disparity Estimation
Gray, Michael S., Pouget, Alexandre, Zemel, Richard S., Nowlan, Steven J., Sejnowski, Terrence J.
Local disparity information is often sparse and noisy, which creates two conflicting demands when estimating disparity in an image region: theneed to spatially average to get an accurate estimate, and the problem of not averaging over discontinuities. We have developed anetwork model of disparity estimation based on disparityselective neurons,such as those found in the early stages of processing in visual cortex. The model can accurately estimate multiple disparities in a region, which may be caused by transparency or occlusion, inreal images and random-dot stereograms. The use of a selection mechanism to selectively integrate reliable local disparity estimates results in superior performance compared to standard back-propagation and cross-correlation approaches. In addition, the representations learned with this selection mechanism are consistent withrecent neurophysiological results of von der Heydt, Zhou, Friedman, and Poggio [8] for cells in cortical visual area V2. Combining multi-scale biologically-plausible image processing with the power of the mixture-of-experts learning algorithm represents a promising approach that yields both high performance and new insights into visual system function.
Statistically Efficient Estimations Using Cortical Lateral Connections
Pouget, Alexandre, Zhang, Kechen
Coarse codes are widely used throughout the brain to encode sensory andmotor variables. Methods designed to interpret these codes, such as population vector analysis, are either inefficient, i.e., the variance of the estimate is much larger than the smallest possible variance,or biologically implausible, like maximum likelihood. Moreover, these methods attempt to compute a scalar or vector estimate of the encoded variable. Neurons are faced with a similar estimationproblem. They must read out the responses of the presynaptic neurons, but, by contrast, they typically encode the variable with a further population code rather than as a scalar. We show how a nonlinear recurrent network can be used to perform theseestimation in an optimal way while keeping the estimate in a coarse code format. This work suggests that lateral connections inthe cortex may be involved in cleaning up uncorrelated noise among neurons representing similar variables.
ARTEX: A Self-organizing Architecture for Classifying Image Regions
Grossberg, Stephen, Williamson, James R.
Automatic processing of visual scenes often begins by detecting regions of an image with common values of simple local features, such as texture, and mapping the pattern offeatureactivation into a predicted region label. We develop a self-organizing neural architecture, called the ARTEX algorithm, for automatically extracting a novel and effective array of such features and mapping them to output region labels. ARTEXis made up of biologically motivated networks, the Boundary Contour System and Feature Contour System (BCS/FCS) networks for visual feature extraction (Cohen& Grossberg, 1984; Grossberg & Mingolla, 1985a, 1985b; Grossberg & Todorovic, 1988; Grossberg, Mingolla, & Williamson, 1995), and the Gaussian ARTMAP (GAM) network for classification (Williamson, 1996). ARTEX is first evaluated on a difficult real-world task, classifying regions of synthetic apertureradar (SAR) images, where it reliably achieves high resolution (single 874 S.Grossberg and 1. R. Williamson pixel) classification results, and creates accurate probability maps for its class predictions. ARTEXis then evaluated on classification of natural textures, where it outperforms the texture classification system in Greenspan, Goodman, Chellappa, & Anderson (1994) using comparable preprocessing and training conditions. 2 FEATURE EXTRACTION NETWORKS
A Mean Field Algorithm for Bayes Learning in Large Feed-forward Neural Networks
In the Bayes approach to statistical inference [Berger, 1985] one assumes that the prior uncertainty about parameters of an unknown data generating mechanism can be encoded in a probability distribution, the so called prior. Using the prior and the likelihood of the data given the parameters, the posterior distribution of the parameters can be derived from Bayes rule. From this posterior, various estimates for functions ofthe parameter, like predictions about unseen data, can be calculated. However, in general, those predictions cannot be realised by specific parameter values, but only by an ensemble average over parameters according to the posterior probability. Hence,exact implementations of Bayes method for neural networks require averages over network parameters which in general can be performed by time consuming 226 M.Opper and O. Winther Monte Carlo procedures.
Multi-Task Learning for Stock Selection
Ghosn, Joumana, Bengio, Yoshua
Artificial Neural Networks can be used to predict future returns of stocks in order to take financial decisions. Should one build a separate network for each stock or share the same network for all the stocks? In this paper we also explore other alternatives, in which some layers are shared and others are not shared. When the prediction of future returns for different stocks are viewed as different tasks, sharing some parameters across stocks is a form of multi-task learning. In a series of experiments with Canadian stocks, we obtain yearly returns that are more than 14% above various benchmarks. 1 Introd uction Previous applications of ANNs to financial time-series suggest that several of these prediction and decision-taking tasks present sufficient non-linearities to justify the use of ANNs (Refenes, 1994; Moody, Levin and Rehfuss, 1993).
Unsupervised Learning by Convex and Conic Coding
Lee, Daniel D., Seung, H. Sebastian
Unsupervised learning algorithms based on convex and conic encoders areproposed. The encoders find the closest convex or conic combination of basis vectors to the input. The learning algorithms produce basis vectors that minimize the reconstruction error of the encoders. The convex algorithm develops locally linear models of the input, while the conic algorithm discovers features. Both algorithms areused to model handwritten digits and compared with vector quantization and principal component analysis.