On First-Order Meta-Reinforcement Learning with Moreau Envelopes