AITopics | rudder

Collaborating Authors

rudder

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

RUDDER: Return Decomposition for Delayed Rewards

Jose A. Arjona-Medina, Michael Gillhofer, Michael Widrich, Thomas Unterthiner, Johannes Brandstetter, Sepp Hochreiter

Neural Information Processing SystemsFeb-11-2026, 13:56:30 GMT

Neural Information Processing Systems http://nips.cc/

redistribution, reward redistribution, rudder, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)
Asia > Middle East > Jordan (0.04)
(5 more...)

Industry:

Education (0.46)
Leisure & Entertainment > Games (0.30)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

RUDDER: Return Decomposition for Delayed Rewards

anonymous

Neural Information Processing SystemsFeb-11-2026, 13:56:14 GMT

reinforcement learning; delayed reward; reward redistribution; return decomposition; bias-variance; credit assignment; LSTM

infinitesimal change, reward redistribution, rudder, (13 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.61)

Add feedback

RUDDER: Return Decomposition for Delayed Rewards

Neural Information Processing SystemsDec-25-2025, 01:42:32 GMT

We propose RUDDER, a novel reinforcement learning approach for delayed rewards in finite Markov decision processes (MDPs). In MDPs the Q-values are equal to the expected immediate reward plus the expected future rewards. The latter are related to bias problems in temporal difference (TD) learning and to high variance problems in Monte Carlo (MC) learning. Both problems are even more severe when rewards are delayed. RUDDER aims at making the expected future rewards zero, which simplifies Q-value estimation to computing the mean of the immediate reward. We propose the following two new concepts to push the expected future rewards toward zero.

name change, return decomposition, rudder, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Adaptive Residual-Update Steering for Low-Overhead Hallucination Mitigation in Large Vision Language Models

Zou, Zhengtao, Gao, Ya, Guan, Jiarui, Li, Bin, Marttinen, Pekka

arXiv.org Artificial IntelligenceNov-14-2025

Large Vision-Language Models (L VLMs) often suffer from object hallucination, generating text inconsistent with visual inputs, which can critically undermine their reliability. Existing inference-time interventions to mitigate this issue present a challenging trade-off: while methods that steer internal states or adjust output logits can be effective, they often incur substantial computational overhead, typically requiring extra forward passes. This efficiency bottleneck can limit their practicality for real-world, latency-sensitive deployments. In this work, we aim to address this trade-off with Residual-Update Directed DEcoding Regulation (RUDDER), a low-overhead framework that steers L VLMs towards visually-grounded generation. RUDDER is built on two key innovations: (1) Contextual Activation Residual Direction (CARD) vector, a per-sample visual evidence vector extracted from the residual update of a self-attention layer during a single, standard forward pass. Extensive experiments on key hallucination benchmarks, including POPE and CHAIR, indicate that RUDDER achieves performance comparable to state-of-the-art methods while introducing negligible computational latency, validating RUDDER as a pragmatic and effective approach for improving L VLMs' reliability without a significant compromise on efficiency. Code is available at https://anonymous.4open.science/r/ While Large Vision-Language Models (L VLMs) have shown remarkable capabilities in multimodal tasks and are increasingly deployed to assist with real-world problems (Alayrac et al., 2022; Liu et al., 2024a), their practical reliability is critically undermined by a persistent challenge: object hallucination. As shown in Figure 1, L VLMs frequently generate fluent, convincing text that is factually inconsistent with visual groundings, severely limiting their real-world utility and credibility (Ji et al., 2023).

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2511.10292

Country: Europe > Austria (0.28)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

RUDDER: Return Decomposition for Delayed Rewards

Jose A. Arjona-Medina, Michael Gillhofer, Michael Widrich, Thomas Unterthiner, Johannes Brandstetter, Sepp Hochreiter

Neural Information Processing SystemsOct-2-2025, 05:30:57 GMT

We propose RUDDER, a novel reinforcement learning approach for delayed rewards in finite Markov decision processes (MDPs).

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States (0.46)
Europe (0.46)

Industry:

Education (0.46)
Leisure & Entertainment > Games (0.30)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

RUDDER: Return Decomposition for Delayed Rewards

anonymous

Neural Information Processing SystemsOct-2-2025, 05:30:42 GMT

reinforcement learning; delayed reward; reward redistribution; return decomposition; bias-variance; credit assignment; LSTM

artificial intelligence, machine learning, reward redistribution, (16 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.61)

Add feedback

Meta sacrifices a heap of money at the altar of AI

The GuardianJun-17-2025, 13:24:24 GMT

Mark Zuckerberg announced in April that the company would make huge capital expenditures in the coming year to keep up in the race to develop cutting-edge artificial intelligence. He made good on that promise last week with a 15bn "AI superintelligence" team that would feature reported nine-figure salaries and a 49% investment in Scale AI. Before Meta's investment, Scale counted most of the major players in AI among its clients, and some of them were less than thrilled with the development. Bloomberg puts it succinctly: Scale AI's Wang Brings to Meta Knowledge of What Everyone Else is Doing. Google, Scale's largest customer, got scared.

artificial intelligence, large language model, natural language, (18 more...)

The Guardian

Country:

North America > United States > New York (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
Asia > Middle East > Palestine > Gaza Strip > Gaza Governorate > Gaza (0.04)

Industry:

Information Technology > Services (0.67)
Government > Immigration & Customs (0.48)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Communications > Mobile (0.89)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.31)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.30)

Add feedback

Cross-platform Learning-based Fault Tolerant Surfacing Controller for Underwater Robots

Hamamatsu, Yuya, Remmas, Walid, Rebane, Jaan, Kruusmaa, Maarja, Ristolainen, Asko

arXiv.org Artificial IntelligenceFeb-10-2025

In this paper, we propose a novel cross-platform fault-tolerant surfacing controller for underwater robots, based on reinforcement learning (RL). Unlike conventional approaches, which require explicit identification of malfunctioning actuators, our method allows the robot to surface using only the remaining operational actuators without needing to pinpoint the failures. The proposed controller learns a robust policy capable of handling diverse failure scenarios across different actuator configurations. Moreover, we introduce a transfer learning mechanism that shares a part of the control policy across various underwater robots with different actuators, thus improving learning efficiency and generalization across platforms. To validate our approach, we conduct simulations on three different types of underwater robots: a hovering-type AUV, a torpedo shaped AUV, and a turtle-shaped robot (U-CAT). Additionally, real-world experiments are performed, successfully transferring the learned policy from simulation to a physical U-CAT in a controlled environment. Our RL-based controller demonstrates superior performance in terms of stability and success rate compared to a baseline controller, achieving an 85.7 percent success rate in real-world tests compared to 57.1 percent with a baseline controller. This research provides a scalable and efficient solution for fault-tolerant control for diverse underwater platforms, with potential applications in real-world aquatic missions.

controller, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2502.07133

Country:

Europe > Estonia > Harju County > Tallinn (0.04)
Oceania > Australia > Queensland > Brisbane (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
(2 more...)

Genre: Research Report > Experimental Study (0.88)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.32)

Add feedback

Reviews: RUDDER: Return Decomposition for Delayed Rewards

Neural Information Processing SystemsJan-21-2025, 22:34:08 GMT

The reward redistribution method is proven to preserve optimal policies and reduce the expected future reward to zero. This is achieved by redistributing the delayed rewards to the salient state-action events (where saliency is determined by contribution analysis methods). Extensive experiments in both toy domains, as well as the suite of Atari games, demonstrate the method's improvements for delayed reward tasks, as well as the shortcomings of MC and TD methods for these types of tasks. Comments: I felt the work presented in the paper is outstanding. There are numerous contributions that could conceivably stand on their own (resulting in an extremely large appendix!).

experiment, return decomposition, rudder, (8 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games > Computer Games (0.59)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.56)

Add feedback