Improving On-policy Learning with Statistical Reward Accumulation