Distributed primal-dual algorithm for constrained multi-agent reinforcement learning under coupled policies