Learning Upper Lower Value Envelopes to Shape Online RL: A Principled Approach