Dual Critic Reinforcement Learning under Partial Observability