A Policy Gradient for Sub task Tree

Neural Information Processing Systems 

Recursively each segment is again partitioned by π . Then results in Eq. (7): Pr Then, we have: k=1, the trajectory of planned sub-goals. This allows us to derive a policy gradient proposition: Proposition 2. Let planning policy π We design three benchmark environments in our experiment (as shown in Figure 5). The robot is abstracted with a point mass moving in the plane. A rigid body robot, abstracted as a thin rectangle, is used here.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found