Reinforcement Learning for Trading
Moody, John E., Saffell, Matthew
–Neural Information Processing Systems
Inthis paper, we propose to use recurrent reinforcement learning to directly optimize such trading system performance functions, and we compare two different reinforcementlearning methods. The first, Recurrent Reinforcement Learning, uses immediate rewards to train the trading systems, while the second (Q-Learning (Watkins 1989)) approximates discounted future rewards. These methodologies can be applied to optimizing systems designed to trade a single security or to trade portfolios .In addition, we propose a novel value function for risk-adjusted return that enables learning to be done online: the differential Sharpe ratio. Trading system profits depend upon sequences of interdependent decisions, and are thus path-dependent. Optimal trading decisions when the effects of transactions costs, market impact and taxes are included require knowledge of the current system state. In Moody, Wu, Liao & Saffell (1998), we demonstrate that reinforcement learning provides a more elegant and effective means for training trading systems when transaction costs are included, than do more standard supervised approaches.
Neural Information Processing Systems
Dec-31-1999
- Country:
- North America > United States > Oregon (0.14)
- Industry:
- Banking & Finance > Trading (1.00)
- Technology: