Policy Optimization in Multi-Agent Settings under Partially Observable Environments