0b9e57c46de934cee33b0e8d1839bfc2-Supplemental.pdf

Neural Information Processing Systems 

The environment settings for Atari environments are shown in Table1. The environment settings for maze environments are shown in Table1. Forinstance, inautonomous driving, weneed tobalance thesafety (distance from other cars), the speed, the comfort (the acceleration, etc.), and many other factors to make sure the car functions normally. If the algorithm can only model the marginal distributions, we can correctly compute the probability of simultaneously meeting multiple constraints only if they are independent. We use the joint distribution by MD3QN to achieve this: specifically, given the modeled joint distribution µ(s,a), the agent can compute the probability to satisfy all the constraints in the joint distribution and take action byargmaxaPZ µ(s,a)(Z satisfyallthreeconstraints).

Similar Docs  Excel Report  more

TitleSimilaritySource
None found