AITopics | Agents

Notice that the Tabular CRR exp objective looks different from the learning rule defined by Eqn. 4. Following Eqn. 8, we see that whenever µ In addition to being safe, we show that each iteration of CRR improves performance. To compute the performance of each agent, as reported in the Tables 2, 3,5, 6 and 7, we adopt the following procedure. We run each agent with three independent seeds. Agent snapshots are made every 50000 learner steps. As discussed in Sec. 3 using K-step returns can hurt the agent's performance To test this hypothesis, we evaluate CRR's (using the binary This objective is similar to the ones used in [27, 7].

artificial intelligence, dataset, machine learning, (15 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.54)

Add feedback

Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Gradient Estimators for Reinforcement Learning

Gregory Farquhar, Shimon Whiteson, Jakob Foerster

Neural Information Processing SystemsOct-2-2025, 23:28:19 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.53)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.47)

Add feedback

569ff987c643b4bedf504efda8f786c2-Paper.pdf

Neural Information Processing SystemsOct-2-2025, 23:27:21 GMT

machine learning, natural language, reinforcement learning, (19 more...)

Neural Information Processing Systems

Country: North America > United States (0.14)

Genre:

Overview (0.68)
Research Report (0.46)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Education (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Poincaré Recurrence, Cycles and Spurious Equilibria in Gradient-Descent-Ascent for Non-Convex Non-Concave Zero-Sum Games

Emmanouil-Vasileios Vlatakis-Gkaragkounis, Lampros Flokas, Georgios Piliouras

Neural Information Processing SystemsOct-2-2025, 22:47:10 GMT

We study a wide class of non-convex non-concave min-max games that generalizes over standard bilinear zero-sum games.

artificial intelligence, machine learning, optimization problem, (19 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
Asia (0.93)
North America > United States > California > Los Angeles County (0.28)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
(2 more...)

Add feedback

No-Regret Learning in Unknown Games with Correlated Payoffs

Pier Giuseppe Sessa, Ilija Bogunovic, Maryam Kamgarpour, Andreas Krause

Neural Information Processing SystemsOct-2-2025, 22:01:07 GMT

The performance of an agent in a repeated game is often measured in terms of regret .

artificial intelligence, data mining, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America (0.46)
Europe (0.28)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.46)

Add feedback

concerns (C

Neural Information Processing SystemsOct-2-2025, 22:00:52 GMT

We would like to thank all the reviewers for their constructive feedback. Citations refer to references in the paper and to the additional ones provided below. "I do agree that full information feedback is hard to expect in real scenarios,... However, the current Is there an application where this is a more realistic assumption?" The main motivation for our model is a setting that is in between the full information and bandit feedback. The proposed feedback model is also present in other practical applications.

artificial intelligence, assumption, reward function, (17 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games (0.49)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.30)

Add feedback

4fbe073f17f161810fdf3dab1307b30f-Paper.pdf

Neural Information Processing SystemsOct-2-2025, 22:00:42 GMT

artificial intelligence, certificate, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)

Add feedback

Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing SystemsOct-2-2025, 21:37:32 GMT

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. The paper proposes a fairer optimization criterion, "regularized maximin", for centralized multi-agent MDPs. The idea, taken from the networking literature is elegant. The authors also propose an iterative optimization method that scales somewhat better than linear programming. The description of the transition model, lines 69-79, seems unnecessarily detailed.

algorithm, artificial intelligence, optimization problem, (16 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec > Montreal (0.05)

Genre: Summary/Review (0.31)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.70)

Add feedback

Filters

Collaborating Authors

Agents

Individual Regret in Cooperative Nonstochastic Multi-Armed Bandits

2456a42386e445ba884511aa17ca4a30-Paper-Conference.pdf

A Appendix

Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Gradient Estimators for Reinforcement Learning

569ff987c643b4bedf504efda8f786c2-Paper.pdf

Poincaré Recurrence, Cycles and Spurious Equilibria in Gradient-Descent-Ascent for Non-Convex Non-Concave Zero-Sum Games

No-Regret Learning in Unknown Games with Correlated Payoffs

concerns (C

4fbe073f17f161810fdf3dab1307b30f-Paper.pdf

Export Reviews, Discussions, Author Feedback and Meta-Reviews