AITopics | rew

Country:

Asia > Taiwan (0.04)
North America > United States > Massachusetts (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Government (0.92)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)

Neural Information Processing SystemsFeb-10-2026, 22:07:09 GMT

A Broader Impacts

Markov Condition, are not independent of each other .

artificial intelligence, cau, machine learning, (19 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsFeb-9-2026, 17:46:23 GMT

708e58b0b99e3e62d42022b4564bad7a-Supplemental-Conference.pdf

artificial intelligence, learner, rew, (17 more...)

Technology: Information Technology > Artificial Intelligence (0.70)

arXiv.org Artificial IntelligenceDec-1-2025

REWA: A General Theory of Witness-Based Similarity

Phadke, Nikit

We present a universal framework for similarity-preserving encodings that subsumes all discrete, continuous, algebraic, and learned similarity methods under a single theoretical umbrella. By formulating similarity as functional witness projection over monoids, we prove that \[ O\!\left(\frac{1}{Δ^{2}}\log N\right) \] encoding complexity with ranking preservation holds for arbitrary algebraic structures. This unification reveals that Bloom filters, Locality Sensitive Hashing (LSH), Count-Min sketches, Random Fourier Features, and Transformer attention kernels are instances of the same underlying mechanism. We provide complete proofs with explicit constants under 4-wise independent hashing, handle heavy-tailed witnesses via normalization and clipping, and prove \[ O(\log N) \] complexity for all major similarity methods from 1970-2024. We give explicit constructions for Boolean, Natural, Real, Tropical, and Product monoids, prove tight concentration bounds, and demonstrate compositional properties enabling multi-primitive similarity systems.

artificial intelligence, machine learning, similarity, (16 more...)

2511.19998

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsOct-10-2025, 22:03:17 GMT

Going Beyond Heuristics by Imposing Policy Improvement as a Constraint Chi-Chang Lee

As such, we prevent policies from merely exploiting heuristic rewards without improving the task reward.

buf, reset, torch, (16 more...)

Country:

Asia > Taiwan (0.04)
North America > United States > Massachusetts (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Government (0.92)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)

Neural Information Processing SystemsAug-15-2025, 18:39:52 GMT

A Proofs

We therefore can drop the latter term from our bound. Consider the Cliff problem of Swamy et al. [2021]. Note that under Asymptotic Realizability (Assumption 5.1), there exists a policy We specialize on the two-arm case as it is the most difficult for the learner. When this limit exists, the average over timesteps of moment-matching error is equal to it. We give the off-policy learners 25 demonstration trajectories, each of length 1000.

artificial intelligence, learner, rew, (17 more...)

Technology: Information Technology > Artificial Intelligence (0.70)

Lee, Chi-Chang, Hong, Zhang-Wei, Agrawal, Pulkit

Going Beyond Heuristics by Imposing Policy Improvement as a Constraint

arXiv.org Artificial IntelligenceJul-9-2025

In many reinforcement learning (RL) applications, augmenting the task rewards with heuristic rewards that encode human priors about how a task should be solved is crucial for achieving desirable performance. However, because such heuristics are usually not optimal, much human effort and computational resources are wasted in carefully balancing tasks and heuristic rewards. Theoretically rigorous ways of incorporating heuristics rely on the idea of \textit{policy invariance}, which guarantees that the performance of a policy obtained by maximizing heuristic rewards is the same as the optimal policy with respect to the task reward. However, in practice, policy invariance doesn't result in policy improvement, and such methods are known to empirically perform poorly. We propose a new paradigm to mitigate reward hacking and effectively use heuristics based on the practical goal of maximizing policy improvement instead of policy improvement. Our framework, Heuristic Enhanced Policy Optimization (HEPO), effectively leverages heuristics while avoiding the pitfall of prior methods for mitigating reward hacking. HEPO achieves superior performance on standard benchmarks with well-engineered reward functions. More surprisingly, HEPO allows policy optimization to achieve good performance even when heuristics are not well-engineered and designed by non-expert humans, showcasing HEPO's ability to reduce human effort in reward design. % HEPO is a plug-and-play optimization method for leveraging heuristics in reinforcement learning. Code is available at https://github.com/Improbable-AI/hepo.

artificial intelligence, machine learning, reinforcement learning, (21 more...)

2507.05328

Country:

Asia > Taiwan (0.04)
North America > United States > Massachusetts (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Government (0.92)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceMar-4-2025

Memorize or Generalize? Evaluating LLM Code Generation with Evolved Questions

Chen, Wentao, Zhang, Lizhe, Zhong, Li, Peng, Letian, Wang, Zilong, Shang, Jingbo

Large Language Models (LLMs) are known to exhibit a memorization phenomenon in code generation: instead of truly understanding the underlying principles of a programming problem, they tend to memorize the original prompt and its solution together in the training. Consequently, when facing variants of the original problem, their answers very likely resemble the memorized solutions and fail to generalize. In this paper, we investigate this phenomenon by designing three evolution strategies to create variants: mutation, paraphrasing, and code-rewriting. By comparing the performance and AST similarity of the LLM-generated codes before and after these three evolutions, we develop a memorization score that positively correlates with the level of memorization. As expected, as supervised fine-tuning goes on, the memorization score rises before overfitting, suggesting more severe memorization. We demonstrate that common mitigation approaches, such as prompt translation and using evolved variants as data augmentation in supervised learning and reinforcement learning, either compromise the performance or fail to alleviate the memorization issue. Therefore, memorization remains a significant challenge in LLM code generation, highlighting the need for a more effective solution.

acc, dataset, memorization, (16 more...)

2503.02296

Country:

North America > United States > California > San Diego County > San Diego (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

La Malfa, Gabriele, Zhang, Jie M., Luck, Michael, Black, Elizabeth

Fairness Aware Reinforcement Learning via Proximal Policy Optimization

arXiv.org Artificial IntelligenceFeb-6-2025

Fairness in multi-agent systems (MAS) focuses on equitable reward distribution among agents in scenarios involving sensitive attributes such as race, gender, or socioeconomic status. This paper introduces fairness in Proximal Policy Optimization (PPO) with a penalty term derived from demographic parity, counterfactual fairness, and conditional statistical parity. The proposed method balances reward maximisation with fairness by integrating two penalty components: a retrospective component that minimises disparities in past outcomes and a prospective component that ensures fairness in future decision-making. We evaluate our approach in the Allelopathic Harvest game, a cooperative and competitive MAS focused on resource collection, where some agents possess a sensitive attribute. Experiments demonstrate that fair-PPO achieves fairer policies across all fairness metrics than classic PPO. Fairness comes at the cost of reduced rewards, namely the Price of Fairness, although agents with and without the sensitive attribute renounce comparable amounts of rewards. Additionally, the retrospective and prospective penalties effectively change the agents' behaviour and improve fairness. These findings underscore the potential of fair-PPO to address fairness challenges in MAS.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

2502.03953

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceFeb-1-2025

A Differentiated Reward Method for Reinforcement Learning based Multi-Vehicle Cooperative Decision-Making Algorithms

Han, Ye, Zhang, Lijun, Meng, Dejian

Reinforcement learning (RL) shows great potential for optimizing multi-vehicle cooperative driving strategies through the state-action-reward feedback loop, but it still faces challenges such as low sample efficiency. This paper proposes a differentiated reward method based on steady-state transition systems, which incorporates state transition gradient information into the reward design by analyzing traffic flow characteristics, aiming to optimize action selection and policy learning in multi-vehicle cooperative decision-making. The performance of the proposed method is validated in RL algorithms such as MAPPO, MADQN, and QMIX under varying autonomous vehicle penetration. The results show that the differentiated reward method significantly accelerates training convergence and outperforms centering reward and others in terms of traffic efficiency, safety, and action rationality. Additionally, the method demonstrates strong scalability and environmental adaptability, providing a novel approach for multi-agent cooperative decision-making in complex traffic scenarios.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

2502.00352

Country: Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report > New Finding (0.48)

Industry:

Transportation > Ground > Road (0.69)
Transportation > Infrastructure & Services (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)