control system Σ (ξ

Neural Information Processing Systems 

To prove Lemma 5.1, we just need to show that the Pontryagin's Maximum Principle for the auxiliary X, (S.3) and the following matrix trace properties: Tr(A) = Tr( A Since the above obtained PMP equations (S.2) are the same with the differential PMP in (13), we thus Based on Lemma 5.1 and its proof, we known that the PMP of the auxiliary control system, (S.2), is exactly the differential PMP equations (13). From (S.2c), we solve for U Proof by induction: (S.2d) shows that (S.8) holds for This completes the proof. 2 D Algorithms Details for Different Learning Modes SysID Mode, then use the learned dynamics as the initial guess in IRL/IOC Mode. In design of the quadrotor's control objective function, to achieve SE (3) maneuvering In Fig. S1, we show more detailed results of imitation loss versus iteration In Fig. S2, we show more detailed results of SysID loss versus iteration In Fig. S5, we use the On the cart-pole and robot-arm systems (in Figure 1a and Figure 1b), we learn a feedback policy by minimizing given control objective functions. In Fig. S3, we show the detailed results of control loss (i.e. the value of control objective S6, we have the following remarks. This can be seen in Fig. S3 and Fig. S6 (in Fig. S6, PDP results in a simulated trajectory which is closer to the optimal one than that This explains why PDP outperforms GPS in terms of having lower control cost (loss).