Finding Nash equilibria by minimizing approximate exploitability with learned best responses

Open in new window