Plotting

 Technology



Reinforcement Learning for Trading

Neural Information Processing Systems

Inthis paper, we propose to use recurrent reinforcement learning to directly optimize such trading system performance functions, and we compare two different reinforcementlearning methods. The first, Recurrent Reinforcement Learning, uses immediate rewards to train the trading systems, while the second (Q-Learning (Watkins 1989)) approximates discounted future rewards. These methodologies can be applied to optimizing systems designed to trade a single security or to trade portfolios .In addition, we propose a novel value function for risk-adjusted return that enables learning to be done online: the differential Sharpe ratio. Trading system profits depend upon sequences of interdependent decisions, and are thus path-dependent. Optimal trading decisions when the effects of transactions costs, market impact and taxes are included require knowledge of the current system state. In Moody, Wu, Liao & Saffell (1998), we demonstrate that reinforcement learning provides a more elegant and effective means for training trading systems when transaction costs are included, than do more standard supervised approaches.


Robot Docking Using Mixtures of Gaussians

Neural Information Processing Systems

This paper applies the Mixture of Gaussians probabilistic model, combined withExpectation Maximization optimization to the task of summarizing threedimensional range data for a mobile robot. This provides a flexible way of dealing with uncertainties in sensor information, and allows theintroduction of prior knowledge into low-level perception modules. Problemswith the basic approach were solved in several ways: the mixture of Gaussians was reparameterized to reflect the types of objects expected in the scene, and priors on model parameters were included in the optimization process. Both approaches force the optimization to find'interesting' objects, given the sensor and object characteristics. A higher level classifier was used to interpret the results provided by the model, and to reject spurious solutions.


Optimizing Admission Control while Ensuring Quality of Service in Multimedia Networks via Reinforcement Learning

Neural Information Processing Systems

This paper examines the application of reinforcement learning to a telecommunications networking problem. The problem requires that revenue bemaximized while simultaneously meeting a quality of service constraint that forbids entry into certain states. We present a general solution to this multi-criteria problem that is able to earn significantly higher revenues than alternatives.


A Phase Space Approach to Minimax Entropy Learning and the Minutemax Approximations

Neural Information Processing Systems

There has been much recent work on measuring image statistics and on learning probability distributions on images. We observe that the mapping from images to statistics is many-to-one and show it can be quantified by a phase space factor. This phase space approach throws light on the Minimax Entropy technique for learning Gibbs distributions on images with potentials derived from image statistics and elucidates the ambiguities that are inherent to determining the potentials. In addition, it shows that if the phase factor can be approximated by an analytic distribution then this approximation yields a swift "Minutemax" algorithm that vastly reduces the computation time for Minimax entropy learning. An illustration of this concept, using a Gaussian to approximate the phase factor, gives a good approximation to the results of Zhu and Mumford (1997) in just seconds of CPU time. The phase space approach also gives insight into the multi-scale potentials found by Zhu and Mumford (1997) and suggests that the forms of the potentials are influenced greatly by phase space considerations. Finally, we prove that probability distributions learned in feature space alone are equivalent to Minimax Entropy learning with a multinomial approximation of the phase factor. 1 Introduction Bayesian probability theory gives a powerful framework for visual perception (Knill and Richards 1996). This approach, however, requires specifying prior probabilities and likelihood functions. Learning these probabilities is difficult because it requires estimating distributions on random variables of very high dimensions (for example, images with 200 x 200 pixels, or shape curves of length 400 pixels).


VLSI Implementation of Motion Centroid Localization for Autonomous Navigation

Neural Information Processing Systems

This chip, which uses mixed signal CMOS components to implement photodetection, edge detection, ONset detection and centroid localization, models the retina and superior colliculus. The centroid localization circuit uses time-windowed asynchronously triggered row and column address events and two linear resistive grids to provide the analog coordinates of the motion centroid. This VLSI chip is used to realize fast lightweight autonavigating vehicles. The obstacle avoiding line-following algorithm is discussed.


Gradient Descent for General Reinforcement Learning

Neural Information Processing Systems

These algorithms all have guaranteed convergence, and include modifications of several existing algorithms that were known to fail to converge on simple MOPs. These include Q learning, SARSA, and advantage learning. In addition to these value-based algorithms it also generates pure policy-search reinforcement-learning algorithms, which learn optimal policies without learning a value function.


An Integrated Vision Sensor for the Computation of Optical Flow Singular Points

Neural Information Processing Systems

A robust, integrative algorithm is presented for computing the position of the focus of expansion or axis of rotation (the singular point) in optical flow fields such as those generated by self-motion. Measurements are shown of a fully parallel CMOS analog VLSI motion sensor array which computes the direction of local motion (sign of optical flow) at each pixel and can directly implement this algorithm. The flow field singular point is computed in real time with a power consumption of less than 2 m W. Computation of the singular point for more general flow fields requires measures of field expansion and rotation, which it is shown can also be computed in real-time hardware, again using only the sign of the optical flow field. These measures, along with the location of the singular point, provide robust real-time self-motion information for the visual guidance of a moving platform such as a robot. 1 INTRODUCTION Visually guided navigation of autonomous vehicles requires robust measures of self-motion in the environment. The heading direction, which corresponds to the focus of expansion in the visual scene for a fixed viewing angle, is one of the primary sources of guidance information.


Vertex Identification in High Energy Physics Experiments

Neural Information Processing Systems

In High Energy Physics experiments one has to sort through a high flux of events, at a rate of tens of MHz, and select the few that are of interest. One of the key factors in making this decision is the location of the vertex where the interaction, that led to the event, took place. Here we present a novel solution to the problem of finding the location of the vertex, based on two feedforward neural networkswith fixed architectures, whose parameters are chosen so as to obtain a high accuracy. The system is tested on simulated datasets, and is shown to perform better than conventional algorithms. 1 Introduction An event in High Energy Physics (HEP) is the experimental result of an interaction during the collision of particles in an accelerator. The result of this interaction is the production of tens of particles, each of which is ejected in a different direction and energy. Due to the quantum mechanical effects involved, the events differ from one another in the number of particles produced, the types of particles, and their energies. The trajectories of produced particles are detected by a very large and sophisticated detector.