Using Keras and Deep Deterministic Policy Gradient to play TORCS

#artificialintelligence 

In the previous blog post Using Keras and Deep Q-Network to Play FlappyBird we demonstrate using Deep Q-Network to play FlappyBird. However, a big limitation of Deep Q-Network is that the outputs/actions are discrete while the action like steering are continuous in car racing. An obvious approach to adapt DQN to continuous domains is to simply discretize the action space. However, we encounter the "curse of dimensionality" problem. For example, if you discretize the steering wheel from -90 to 90 degrees in 5 degrees each and acceleration from 0km to 300km in 5km each, your output combinations will be 36 steering states times 60 velocity states which equals to 2160 possible combinations. The situation will be worse when you want to build robots to perform something very specialized, such as brain surgery that requires fine control of actions and naive discretization will not able to achieve the required precision to do the operations.