AITopics | td-error

Collaborating Authors

td-error

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

DIFFER: Decomposing Individual Reward for Fair Experience Replay in Multi-Agent Reinforcement Learning

Neural Information Processing SystemsFeb-17-2026, 20:00:59 GMT

The code is available in https://github.com/cathyhxh/DIFFER.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country: Asia > China > Anhui Province > Hefei (0.04)

Genre:

Research Report > New Finding (0.93)
Overview (0.67)

Industry: Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

03255088ed63354a54e0e5ed957e9008-Supplemental.pdf

Neural Information Processing SystemsFeb-7-2026, 07:55:51 GMT

algorithm, experiment, mage, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

03255088ed63354a54e0e5ed957e9008-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-7-2026, 07:55:33 GMT

mage, multi-step rollout, td-error, (13 more...)

Neural Information Processing Systems

Genre: Summary/Review (0.41)

Technology: Information Technology > Artificial Intelligence (0.51)

Add feedback

Learning to Explore in Diverse Reward Settings via Temporal-Difference-Error Maximization

Griesbach, Sebastian, D'Eramo, Carlo

arXiv.org Artificial IntelligenceOct-22-2025

Numerous heuristics and advanced approaches have been proposed for exploration in different settings for deep reinforcement learning. Noise-based exploration generally fares well with dense-shaped rewards and bonus-based exploration with sparse rewards. However, these methods usually require additional tuning to deal with undesirable reward settings by adjusting hyperparameters and noise distributions. Rewards that actively discourage exploration, i.e., with an action cost and no other dense signal to follow, can pose a major challenge. We propose a novel exploration method, Stable Error-seeking Exploration (SEE), that is robust across dense, sparse, and exploration-adverse reward settings. To this endeavor, we revisit the idea of maximizing the TD-error as a separate objective. Our method introduces three design choices to mitigate instability caused by far-off-policy learning, the conflict of interest of maximizing the cumulative TD-error in an episodic setting, and the non-stationary nature of TD-errors. SEE can be combined with off-policy algorithms without modifying the optimization pipeline of the original objective. In our experimental analysis, we show that a Soft-Actor Critic agent with the addition of SEE performs robustly across three diverse reward settings in a variety of tasks without hyperparameter adjustments.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2506.13345

Country:

Europe > Germany (0.68)
North America > United States > California (0.28)
Europe > United Kingdom > England (0.28)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Exploration is a crucial and distinctive aspect of reinforcement learning (RL) that remains a fundamental open problem. Several methods have been proposed to tackle this challenge. Commonly used methods inject random noise directly into the actions, indirectly via entropy maximization, or add intrinsic rewards that encourage the agent to steer to novel regions of the state space. Another previously seen idea is to use the Bellman error as a separate optimization objective for exploration. In this paper, we introduce three modifications to stabilize the latter and arrive at a deterministic exploration policy. Our separate exploration agent is informed about the state of the exploitation, thus enabling it to account for previous experiences. Further components are introduced to make the exploration objective agnostic toward the episode length and to mitigate instability introduced by far-off-policy learning. Our experimental results show that our approach can outperform $\varepsilon$-greedy in dense and sparse reward settings.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2410.2384

Country:

North America > United States > Massachusetts (0.14)
Europe > United Kingdom > England (0.14)
Europe > Germany > Bavaria (0.14)
Asia > Japan (0.14)

Genre: Research Report (1.00)

Industry:

Energy > Oil & Gas (0.47)
Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)

Add feedback

DIFFER: Decomposing Individual Reward for Fair Experience Replay in Multi-Agent Reinforcement Learning

Hu, Xunhan, Zhao, Jian, Zhou, Wengang, Feng, Ruili, Li, Houqiang

arXiv.org Artificial IntelligenceMay-25-2023

Cooperative multi-agent reinforcement learning (MARL) is a challenging task, as agents must learn complex and diverse individual strategies from a shared team reward. However, existing methods struggle to distinguish and exploit important individual experiences, as they lack an effective way to decompose the team reward into individual rewards. To address this challenge, we propose DIFFER, a powerful theoretical framework for decomposing individual rewards to enable fair experience replay in MARL. By enforcing the invariance of network gradients, we establish a partial differential equation whose solution yields the underlying individual reward function. The individual TD-error can then be computed from the solved closed-form individual rewards, indicating the importance of each piece of experience in the learning task and guiding the training process. Our method elegantly achieves an equivalence to the original learning framework when individual experiences are homogeneous, while also adapting to achieve more muscular efficiency and fairness when diversity is observed.Our extensive experiments on popular benchmarks validate the effectiveness of our theory and method, demonstrating significant improvements in learning efficiency and fairness.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2301.10574

Country: Asia > China (0.04)

Genre: Research Report > New Finding (0.93)

Industry:

Transportation (0.47)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.35)

Add feedback

DeepADMR: A Deep Learning based Anomaly Detection for MANET Routing

Yahja, Alex, Kaviani, Saeed, Ryu, Bo, Kim, Jae H., Larson, Kevin A.

arXiv.org Artificial IntelligenceJan-24-2023

We developed DeepADMR, a novel neural anomaly detector for the deep reinforcement learning (DRL)-based DeepCQ+ MANET routing policy. The performance of DRL-based algorithms such as DeepCQ+ is only verified within the trained and tested environments, hence their deployment in the tactical domain induces high risks. DeepADMR monitors unexpected behavior of the DeepCQ+ policy based on the temporal difference errors (TD-errors) in real-time and detects anomaly scenarios with empirical and non-parametric cumulative-sum statistics. The DeepCQ+ design via multi-agent weight-sharing proximal policy optimization (PPO) is slightly modified to enable the real-time estimation of the TD-errors. We report the DeepADMR performance in the presence of channel disruptions, high mobility levels, and network sizes beyond the training environments, which shows its effectiveness.

data mining, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2302.13877

Country:

North America > United States > Washington > King County > Seattle (0.04)
North America > United States > California > San Diego County > Poway (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Filters

Collaborating Authors

td-error

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

edac78c3e300629acfe6cbe9ca88fb84-Paper-Conference.pdf

DIFFER: Decomposing Individual Reward for Fair Experience Replay in Multi-Agent Reinforcement Learning

03255088ed63354a54e0e5ed957e9008-Supplemental.pdf

03255088ed63354a54e0e5ed957e9008-AuthorFeedback.pdf

Learning to Explore in Diverse Reward Settings via Temporal-Difference-Error Maximization

03255088ed63354a54e0e5ed957e9008-Supplemental.pdf

03255088ed63354a54e0e5ed957e9008-AuthorFeedback.pdf

Deterministic Exploration via Stationary Bellman Error Maximization

DIFFER: Decomposing Individual Reward for Fair Experience Replay in Multi-Agent Reinforcement Learning

DeepADMR: A Deep Learning based Anomaly Detection for MANET Routing