Fast Inverse Reinforcement Learning with Interval Consistent Graph for Driving Behavior Prediction

Shimosaka, Masamichi (Tokyo Institute of Technology) | Sato, Junichi (The University of Tokyo) | Takenaka, Kazuhito (Denso Corporation) | Hitomi, Kentarou (Denso Corporation)

AAAI Conferences 

In contrast, Inverse reinforcement learning (IRL), inverse optimal control, a discrete approach guarantees global optimality once and imitation learning(Ng and Russell 2000; Abbeel proper discrete state space is given, hence it is more suitable and Ng 2004) are modeling frameworks for acquiring rewards for driving behavior modeling. In a discrete approach, (or cost) of a certain environment by using the optimal the calculation cost of MaxEnt IRL is O( S A), where S path under a possibly different environment as training is the number of states and A is the number of actions data. In particular, in human behavior modeling, it is (Ziebart and others 2008). That is, the key for fast prediction shown that human-centered rewards can be obtained with is suppressing the increase of S depending on dimensions maximum entropy inverse reinforcement learning (MaxEnt and preparing a necessary and sufficient action set, A, IRL)(Ziebart and others 2008), which allows suboptimal for representing driving behavior. As examples of existing training data (Huang et al. 2015; Vernaza and Bagnell 2012; discretization schemes, there are mesh grid representation Dragan and Srinivasa 2012; Walker, Gupta, and Hebert (Shimosaka, Kaneko, and Nishi 2014) and random graph 2014). For instance, Ziebart et al. (Ziebart et al. 2008) modeled based representation connected with neighbors (Byravan et the driving behavior of expert taxi drivers and enabled al. 2015). In these approaches, however, A for general dynamic driving behavior prediction based on the experts' very own systems is not trivial. This is because neighbors on experience or knowledge. MaxEnt IRL based driving behavior state space defined by Euclidean distance do not necessarily prediction, which balances safety, comfort, and economic correspond to the transition area of general dynamics performance, is very promising.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found