
REV3: We will clarify Tab.1 in the paper and add derivations of Eqs. 5, 6. DESIRE[21] is variational. The second is an online planning algorithm (Shooting), where Ego's future action sequences are38 optimized tomaximize fortheplanning rewardunderthelearned MFP dynamics model.