Leveraging the Variance of Return Sequences for Exploration Policy