AITopics | shadow reward

Collaborating Authors

shadow reward

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Scalable Primal-Dual Actor-Critic Method for Safe Multi-Agent RL with General Utilities

Neural Information Processing SystemsFeb-14-2026, 09:21:31 GMT

In fact, the interaction of these two aspects requires addressing the fact that each agent's own safety constraint requires information from all others.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)

Genre:

Research Report > New Finding (0.67)
Overview (0.67)

Industry: Information Technology (0.46)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)

Add feedback

72a1ec14aed36985ffba175e0bba3fec-Supplemental-Conference.pdf

Neural Information Processing SystemsOct-8-2025, 21:54:40 GMT

data mining, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)

Genre:

Research Report > New Finding (0.67)
Overview (0.67)

Industry: Information Technology (0.46)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Data Science > Data Mining (0.67)
(2 more...)

Add feedback

30ee748d38e21392de740e2f9dc686b6-AuthorFeedback.pdf

Neural Information Processing SystemsOct-2-2025, 14:36:24 GMT

It's not clear if there are any other meaningful Theorem 3 doesn't require the policy class The authors should emphasize this. Y es, we agree with reviewer's comment. (Theorem 2). Describe why the paper's approach offers advantages over [18]. See also response to Reviewer #3 (C2).

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.52)
Information Technology > Artificial Intelligence > Machine Learning (0.49)

Add feedback

Scalable Multi-Agent Reinforcement Learning with General Utilities

Ying, Donghao, Ding, Yuhao, Koppel, Alec, Lavaei, Javad

arXiv.org Artificial IntelligenceAug-26-2023

Many decision-making problems take a form beyond the classic cumulative reward, such as apprenticeship learning [1], diverse skill discovery [2], pure exploration [3], and state marginal matching [4], among others. Such problems can be abstracted as reinforcement Learning (RL) with general utilities [5, 6], which focus on finding a policy to maximize a nonlinear function of the induced stateaction occupancy measure. It generalizes the standard RL in which the objective is only an inner product between the state-action occupancy measure induced by the policy and a policy-independent reward for each state-action pair. Beyond the single agent RL, consider the multi-agent problem where different agents need to interact to obtain a favorable outcome by finding a decision policy that maximizes the global accumulation of all agent's general utility.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2302.07938

Country:

North America > United States > California (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)

Add feedback

Scalable Primal-Dual Actor-Critic Method for Safe Multi-Agent RL with General Utilities

Ying, Donghao, Zhang, Yunkai, Ding, Yuhao, Koppel, Alec, Lavaei, Javad

arXiv.org Artificial IntelligenceMay-27-2023

We investigate safe multi-agent reinforcement learning, where agents seek to collectively maximize an aggregate sum of local objectives while satisfying their own safety constraints. The objective and constraints are described by {\it general utilities}, i.e., nonlinear functions of the long-term state-action occupancy measure, which encompass broader decision-making goals such as risk, exploration, or imitations. The exponential growth of the state-action space size with the number of agents presents challenges for global observability, further exacerbated by the global coupling arising from agents' safety constraints. To tackle this issue, we propose a primal-dual method utilizing shadow reward and $\kappa$-hop neighbor truncation under a form of correlation decay property, where $\kappa$ is the communication radius. In the exact setting, our algorithm converges to a first-order stationary point (FOSP) at the rate of $\mathcal{O}\left(T^{-2/3}\right)$. In the sample-based setting, we demonstrate that, with high probability, our algorithm requires $\widetilde{\mathcal{O}}\left(\epsilon^{-3.5}\right)$ samples to achieve an $\epsilon$-FOSP with an approximation error of $\mathcal{O}(\phi_0^{2\kappa})$, where $\phi_0\in (0,1)$. Finally, we demonstrate the effectiveness of our model through extensive numerical experiments.

agent, constraint, occupancy measure, (15 more...)

arXiv.org Artificial Intelligence

2305.17568

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)

Genre: Research Report > New Finding (0.67)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.66)

Add feedback

MARL with General Utilities via Decentralized Shadow Reward Actor-Critic

Zhang, Junyu, Bedi, Amrit Singh, Wang, Mengdi, Koppel, Alec

arXiv.org Machine LearningMay-29-2021

We posit a new mechanism for cooperation in multi-agent reinforcement learning (MARL) based upon any nonlinear function of the team's long-term state-action occupancy measure, i.e., a \emph{general utility}. This subsumes the cumulative return but also allows one to incorporate risk-sensitivity, exploration, and priors. % We derive the {\bf D}ecentralized {\bf S}hadow Reward {\bf A}ctor-{\bf C}ritic (DSAC) in which agents alternate between policy evaluation (critic), weighted averaging with neighbors (information mixing), and local gradient updates for their policy parameters (actor). DSAC augments the classic critic step by requiring agents to (i) estimate their local occupancy measure in order to (ii) estimate the derivative of the local utility with respect to their occupancy measure, i.e., the "shadow reward". DSAC converges to $\epsilon$-stationarity in $\mathcal{O}(1/\epsilon^{2.5})$ (Theorem \ref{theorem:final}) or faster $\mathcal{O}(1/\epsilon^{2})$ (Corollary \ref{corollary:communication}) steps with high probability, depending on the amount of communications. We further establish the non-existence of spurious stationary points for this problem, that is, DSAC finds the globally optimal policy (Corollary \ref{corollary:global}). Experiments demonstrate the merits of goals beyond the cumulative return in cooperative MARL.

agent, lemma 4, occupancy measure, (14 more...)

arXiv.org Machine Learning

2106.00543

Country:

North America > United States > New Jersey > Mercer County > Princeton (0.04)
North America > United States > Maryland > Prince George's County > Adelphi (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)

Add feedback