Learning to Use Working Memory in Partially Observable Environments through Dopaminergic Reinforcement