AITopics | credit assignment problem

Country:

North America > Canada > Alberta (0.14)
North America > United States > Massachusetts (0.04)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.46)

Neural Information Processing SystemsFeb-10-2026, 23:03:53 GMT

99088dffd5eab0babebcda4bc58bbcea-Paper-Conference.pdf

backprop, burstccn, teaching signal, (15 more...)

Country:

Europe > United Kingdom > England > Bristol (0.05)
Asia > Middle East > Jordan (0.04)
North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report > New Finding (0.93)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.95)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Artificial IntelligenceDec-5-2025

OPTIC-ER: A Reinforcement Learning Framework for Real-Time Emergency Response and Equitable Resource Allocation in Underserved African Communities

Tonwe, Mary

Public service systems in many African regions suffer from delayed emergency response and spatial inequity, causing avoidable suffering. This paper introduces OPTIC-ER, a reinforcement learning (RL) framework for real-time, adaptive, and equitable emergency response. OPTIC-ER uses an attention-guided actor-critic architecture to manage the complexity of dispatch environments. Its key innovations are a Context-Rich State Vector, encoding action sub-optimality, and a Precision Reward Function, which penalizes inefficiency. Training occurs in a high-fidelity simulation using real data from Rivers State, Nigeria, accelerated by a precomputed Travel Time Atlas. The system is built on the TALS framework (Thin computing, Adaptability, Low-cost, Scalability) for deployment in low-resource settings. In evaluations on 500 unseen incidents, OPTIC-ER achieved a 100.00% optimal action selection rate, confirming its robustness and generalization. Beyond dispatch, the system generates Infrastructure Deficiency Maps and Equity Monitoring Dashboards to guide proactive governance and data-informed development. This work presents a validated blueprint for AI-augmented public services, showing how context-aware RL can bridge the gap between algorithmic decision-making and measurable human impact.

machine learning, real time system, reinforcement learning, (18 more...)

2508.12943

Country:

Africa > Nigeria > Rivers State > Port Harcourt (0.04)
North America > United States > New York > Monroe County > Rochester (0.04)
Asia > Middle East > Jordan (0.04)
(2 more...)

Genre: Research Report > New Finding (0.68)

Industry:

Government (1.00)
Transportation > Ground > Road (0.68)
Health & Medicine > Health Care Providers & Services (0.68)
Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Min, Donggyu, Choi, Seongjin, Kim, Dong-Kyu

Deep Reinforcement Learning for Dynamic Origin-Destination Matrix Estimation in Microscopic Traffic Simulations Considering Credit Assignment

arXiv.org Artificial IntelligenceNov-11-2025

Abstract--This paper focuses on dynamic origin-destination matrix estimation (DODE), a crucial calibration process necessary for the effective application of microscopic traffic simulations. The fundamental challenge of the DODE problem in microscopic simulations stems from the complex temporal dynamics and inherent uncertainty of individual vehicle dynamics. This makes it highly challenging to precisely determine which vehicle traverses which link at any given moment, resulting in intricate and often ambiguous relationships between origin-destination (OD) matrices and their contributions to resultant link flows. This phenomenon constitutes the credit assignment problem, a central challenge addressed in this study . We formulate the DODE problem as a Markov Decision Process (MDP) and propose a novel framework that applies model-free deep reinforcement learning (DRL). Within our proposed framework, the agent learns an optimal policy to sequentially generate OD matrices, refining its strategy through direct interaction with the simulation environment. The proposed method is validated on the Nguyen-Dupuis network using SUMO, where its performance is evaluated against ground-truth link flows aggregated at 5-minute intervals over a 30-minute horizon. Experimental results demonstrate that our approach achieves a 43.2% reduction in mean squared error (MSE) compared to the best-performing conventional baseline. By reframing DODE as a sequential decision-making problem, our approach addresses the credit assignment challenge through its learned policy, thereby overcoming the limitations of conventional methods and proposing a novel framework for calibration of microscopic traffic simulations. Modern strategies, such as speed harmonization, lane-changing control, and adaptive traffic signal control, are now predominantly designed and evaluated at the microscopic level, marking a significant shift in traffic management paradigms [2], [3], [4].

artificial intelligence, machine learning, reinforcement learning, (16 more...)

2511.06229

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)
Asia > South Korea > Seoul > Seoul (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Transportation > Ground > Road (0.48)
Transportation > Infrastructure & Services (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Malenfant, Dane, Richards, Blake A.

The challenge of hidden gifts in multi-agent reinforcement learning

arXiv.org Artificial IntelligenceOct-2-2025

Sometimes we benefit from actions that others have taken even when we are unaware that they took those actions. For example, if your neighbor chooses not to take a parking spot in front of your house when you are not there, you can benefit, even without being aware that they took this action. These ``hidden gifts'' represent an interesting challenge for multi-agent reinforcement learning (MARL), since assigning credit when the beneficial actions of others are hidden is non-trivial. Here, we study the impact of hidden gifts with a very simple MARL task. In this task, agents in a grid-world environment have individual doors to unlock in order to obtain individual rewards. As well, if all the agents unlock their door the group receives a larger collective reward. However, there is only one key for all of the doors, such that the collective reward can only be obtained when the agents drop the key for others after they use it. Notably, there is nothing to indicate to an agent that the other agents have dropped the key, thus this act for others is a ``hidden gift''. We show that several different state-of-the-art MARL algorithms, including MARL specific architectures, fail to learn how to obtain the collective reward in this simple task. Interestingly, we find that decentralized actor-critic policy gradient agents can succeed when we provide them with information about their own action history, but MARL agents still cannot solve the task with action history. Finally, we derive a correction term for policy gradient agents, inspired by learning aware approaches, which reduces the variance in learning and helps them to converge to collective success more reliably. These results show that credit assignment in multi-agent settings can be particularly challenging in the presence of ``hidden gifts'', and demonstrate that self learning-awareness in decentralized agents can benefit these settings.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

2505.20579

Country:

North America > Canada > Quebec > Montreal (0.14)
North America > United States > Michigan (0.04)
Asia > Middle East > Qatar > Ad-Dawhah > Doha (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.48)

Industry:

Government (0.46)
Transportation > Ground > Road (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Neural Information Processing SystemsAug-19-2025, 02:13:28 GMT

fe1f9c70bdf347497e1a01b6c486bdb9-Paper.pdf

artificial intelligence, machine learning, reinforcement learning, (12 more...)

Country:

North America > Canada > Alberta (0.14)
North America > United States > Massachusetts (0.04)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.46)

Neural Information Processing SystemsAug-17-2025, 04:44:43 GMT

Single-phase deep learning in cortico-cortical networks

The error-backpropagation (backprop) algorithm remains the most common solution to the credit assignment problem in artificial neural networks.

artificial intelligence, burstccn, machine learning, (16 more...)

Country:

Europe > United Kingdom > England > Bristol (0.05)
Asia > Middle East > Jordan (0.04)
North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report > New Finding (0.93)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.95)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Backpropagation (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Neural Information Processing SystemsAug-16-2025, 00:24:49 GMT

Learning to Branch with Tree MDPs

We revisit the work of Etheve et al.

artificial intelligence, machine learning, tree mdp, (13 more...)

Country:

North America > Canada > Quebec > Montreal (0.04)
Europe > Netherlands > South Holland > Delft (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Nagpal, Kartik, Dong, Dayi, Bouvier, Jean-Baptiste, Mehr, Negar

Leveraging Large Language Models for Effective and Explainable Multi-Agent Credit Assignment

arXiv.org Artificial IntelligenceFeb-24-2025

Recent work, spanning from autonomous vehicle coordination to in-space assembly, has shown the importance of learning collaborative behavior for enabling robots to achieve shared goals. A common approach for learning this cooperative behavior is to utilize the centralized-training decentralized-execution paradigm. However, this approach also introduces a new challenge: how do we evaluate the contributions of each agent's actions to the overall success or failure of the team. This credit assignment problem has remained open, and has been extensively studied in the Multi-Agent Reinforcement Learning literature. In fact, humans manually inspecting agent behavior often generate better credit evaluations than existing methods. We combine this observation with recent works which show Large Language Models demonstrate human-level performance at many pattern recognition tasks. Our key idea is to reformulate credit assignment to the two pattern recognition problems of sequence improvement and attribution, which motivates our novel LLM-MCA method. Our approach utilizes a centralized LLM reward-critic which numerically decomposes the environment reward based on the individualized contribution of each agent in the scenario. We then update the agents' policy networks based on this feedback. We also propose an extension LLM-TACA where our LLM critic performs explicit task assignment by passing an intermediary goal directly to each agent policy in the scenario. Both our methods far outperform the state-of-the-art on a variety of benchmarks, including Level-Based Foraging, Robotic Warehouse, and our new Spaceworld benchmark which incorporates collision-related safety constraints. As an artifact of our methods, we generate large trajectory datasets with each timestep annotated with per-agent reward information, as sampled from our LLM critics.

agent, assignment, credit assignment problem, (9 more...)