Advantage Updating Applied to a Differential Game

Harmon, Mance E., III, Leemon C. Baird, Klopf, A. Harry

Dec-31-1995–Neural Information Processing Systems

An application of reinforcement learning to a linear-quadratic, differential game is presented. The reinforcement learning system uses a recently developed algorithm, the residual gradient form of advantage updating. The game is a Markov Decision Process (MDP) with continuous time, states, and actions, linear dynamics, and a quadratic cost function. The game consists of two players, a missile and a plane; the missile pursues the plane and the plane evades the missile. The reinforcement learning algorithm for optimal control is modified for differential games in order to find the minimax point, rather than the maximum. Simulation results are compared to the optimal solution, demonstrating that the simulated reinforcement learning system converges to the optimal answer. The performance of both the residual gradient and non-residual gradient forms of advantage updating and Q-learning are compared. The results show that advantage updating converges faster than Q-learning in all simulations.

algorithm, artificial intelligence, reinforcement learning, (16 more...)

Neural Information Processing Systems

Dec-31-1995

Conferences PDF

Add feedback

Country:
- Europe > United Kingdom
  - England (0.14)
- North America > United States
  - Massachusetts (0.14)

Industry:
- Government > Military > Air Force (0.50)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
Advantage Updating Applied to a Differential Game
Advantage Updating Applied to a Differential Game

Similar Docs Excel Report more

Title	Similarity	Source
None found