reflecting reviewers ' comments which are not mentioned in this response

Neural Information Processing Systems 

We thank the reviewers for the reviews, providing meaningful insight with constructive feedback. The result was reversed in Hopper, where RL contributed 200.86 while EA actors did 363.53. Therefore, all performance result scores are measured in the fixed interaction step. R2: Ablation study is missing. We presented the effect of the variance update rule in Appendix C.3 by comparing the result Then, we provided all combinations of our proposed mean and variance in Table 2. We will add a section so that it can be seen at a glance.