Learning via Human Feedback in Continuous State and Action Spaces
Ngo, Vien Anh (Ravensburg-Weingarten University of Applied Sciences) | Ertel, Wolfgang (Ravensburg-Weingarten University of Applied Sciences)
We consider the problem of extending manually trainedagents via evaluative reinforcement (TAMER) in con-tinuous state and action spaces. The early work TAMERframework allows a non-technical human to train anagent through a natural form of human feedback, neg-ative or positive. The advantages of TAMER havebeen shown on applications such as training Tetris andMountain Car with only human feedback, Cart-poleand Mountain Car with human feedback and environ-ment reward (augmenting reinforcement learning withhuman feedback). However, those methods are origi-nally designed for discrete state-action, or continuousstate-discrete action problems. In this paper, we intro-duce an extension of TAMER to allow both continu-ous states and actions. The new scheme, actor-criticTAMER, extends the original TAMER to allow usingany general function approximation of a human trainer’sreinforcement signal. Our extension still allows rein-forcement learning to be easily combined with humanfeedback. The experimental results show that the pro-posed method helps a human trainer successfully trainan agent in two continuous state-action domains: Moun-tain Car, and Cart-pole (balancing).
Nov-5-2012
- Country:
- Europe > Germany (0.04)
- North America > United States
- Massachusetts > Middlesex County
- Cambridge (0.04)
- Colorado > Denver County
- Denver (0.04)
- Massachusetts > Middlesex County
- Asia > Middle East
- Jordan (0.04)
- Genre:
- Research Report (0.34)
- Technology: