report the final policy performance (mean std) over the seeds. Due to space constraints, we omit the learning curves