Optimal Decision-Making in Mixed-Agent Partially Observable Stochastic Environments via Reinforcement Learning