A Appendix

Neural Information Processing Systems 

In appendix, we provide some additional results in Section A.1, more implementation details in To compare the stability of training, we didn't early-stop the training process even if the loss of some tasks already exploded. MTRL training compared with both variants, demonstrating the effectiveness of the PaCo design. MT50 is a more complex benchmark in Meta-World containing 50 different manipulation tasks (including the MT10 tasks). Therefore it's hard to determine if the policy has reached to the optimal. The results are shown in Figure 8.