Bias Resilient Multi-Step Off-Policy Goal-Conditioned Reinforcement Learning