AITopics | reward mechanism

Collaborating Authors

reward mechanism

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Gradient-DrivenRewardstoGuaranteeFairnessin CollaborativeMachineLearning

Neural Information Processing SystemsFeb-9-2026, 16:47:28 GMT

In collaborative machine learning(CML), multiple agents pool their resources (e.g.,data)together foracommon learning task.

artificial intelligence, inproc, machine learning, (17 more...)

Neural Information Processing Systems

Country: Asia > Singapore (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.48)

Add feedback

Gradient-DrivenRewardstoGuaranteeFairnessin CollaborativeMachineLearning

Neural Information Processing SystemsFeb-9-2026, 16:47:24 GMT

In collaborative machine learning(CML), multiple agents pool their resources (e.g.,data)together foracommon learning task.

artificial intelligence, inproc, machine learning, (17 more...)

Neural Information Processing Systems

Country: Asia > Singapore (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.48)

Add feedback

Rethinking Incentives in Recommender Systems: Are Monotone Rewards Always Beneficial?

Neural Information Processing SystemsDec-27-2025, 03:24:00 GMT

The past decade has witnessed the flourishing of a new profession as media content creators, who rely on revenue streams from online content recommendation platforms. The reward mechanism employed by these platforms creates a competitive environment among creators which affects their production choices and, consequently, content distribution and system welfare. It is thus crucial to design the platform's reward mechanism in order to steer the creators' competition towards a desirable welfare outcome in the long run. This work makes two major contributions in this regard: first, we uncover a fundamental limit about a class of widely adopted mechanisms, coined \emph{Merit-based Monotone Mechanisms}, by showing that they inevitably lead to a constant fraction loss of the optimal welfare. To circumvent this limitation, we introduce \emph{Backward Rewarding Mechanisms} (BRMs) and show that the competition game resultant from BRMs possesses a potential game structure. BRMs thus naturally induce strategic creators' collective behaviors towards optimizing the potential function, which can be designed to match any given welfare metric. In addition, the class of BRM can be parameterized so that it allows the platform to directly optimize welfare within the feasible mechanism space even when the welfare metric is not explicitly defined.

name change, recommender system, rethinking incentive, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

Experience-Driven Exploration for Efficient API-Free AI Agents

Tang, Chenwei, Xing, Jingyu, Liu, Xinyu, Wang, Zizhou, Du, Jiawei, Zhen, Liangli, Lv, Jiancheng

arXiv.org Artificial IntelligenceNov-4-2025

Most existing software lacks accessible Application Programming Interfaces (APIs), requiring agents to operate solely through pixel-based Graphical User Interfaces (GUIs). In this API-free setting, large language model (LLM)-based agents face severe efficiency bottlenecks: limited to local visual experiences, they make myopic decisions and rely on inefficient trial-and-error, hindering both skill acquisition and long-term planning. To address these challenges, we propose KG-Agent, an experience-driven learning framework that structures an agent's raw pixel-level interactions into a persistent State-Action Knowledge Graph (SA-KG). KG-Agent overcomes inefficient exploration by linking functionally similar but visually distinct GUI states, forming a rich neighborhood of experience that enables the agent to generalize from a diverse set of historical strategies. To support long-horizon reasoning, we design a hybrid intrinsic reward mechanism based on the graph topology, combining a state value reward for exploiting known high-value pathways with a novelty reward that encourages targeted exploration. This approach decouples strategic planning from pure discovery, allowing the agent to effectively value setup actions with delayed gratification. We evaluate KG-Agent in two complex, open-ended GUI-based decision-making environments (Civilization V and Slay the Spire), demonstrating significant improvements in exploration efficiency and strategic depth over the state-of-the-art methods.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2510.15259

Country: Asia (0.46)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.34)

Industry: Leisure & Entertainment > Games > Computer Games (0.68)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

(P)rior(D)yna(F)low: A Priori Dynamic Workflow Construction via Multi-Agent Collaboration

Lin, Yi, Zhao, Lujin, Shi, Yijie

arXiv.org Artificial IntelligenceSep-19-2025

Recent studies have shown that carefully designed workflows coordinating large language models(LLMs) significantly enhance task-solving capabilities compared to using a single model. While an increasing number of works focus on autonomous workflow construction, most existing approaches rely solely on historical experience, leading to limitations in efficiency and adaptability. We argue that while historical experience is valuable, workflow construction should also flexibly respond to the unique characteristics of each task. To this end, we propose an a priori dynamic framework for automated workflow construction. Our framework first leverages Q-table learning to optimize the decision space, guiding agent decisions and enabling effective use of historical experience. At the same time, agents evaluate the current task progress and make a priori decisions regarding the next executing agent, allowing the system to proactively select the more suitable workflow structure for each given task. Additionally, we incorporate mechanisms such as cold-start initialization, early stopping, and pruning to further improve system efficiency. Experimental evaluations on four benchmark datasets demonstrate the feasibility and effectiveness of our approach. Compared to state-of-the-art baselines, our method achieves an average improvement of 4.05%, while reducing workflow construction and inference costs to only 30.68%-48.31% of those required by existing methods.

large language model, machine learning, node, (19 more...)

arXiv.org Artificial Intelligence

2509.14547

Country:

Asia (0.46)
North America > Mexico (0.28)

Genre:

Workflow (1.00)
Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Feedback-Driven Tool-Use Improvements in Large Language Models via Automated Build Environments

Ye, Junjie, Jiang, Changhao, Du, Zhengyin, Xu, Yufei, Yao, Xuesong, Xi, Zhiheng, Fan, Xiaoran, Zhang, Qi, Gui, Tao, Huang, Xuanjing, Chen, Jiecao

arXiv.org Artificial IntelligenceSep-15-2025

Effective tool use is essential for large language models (LLMs) to interact meaningfully with their environment. However, progress is limited by the lack of efficient reinforcement learning (RL) frameworks specifically designed for tool use, due to challenges in constructing stable training environments and designing verifiable reward mechanisms. To address this, we propose an automated environment construction pipeline, incorporating scenario decomposition, document generation, function integration, complexity scaling, and localized deployment. This enables the creation of high-quality training environments that provide detailed and measurable feedback without relying on external tools. Additionally, we introduce a verifiable reward mechanism that evaluates both the precision of tool use and the completeness of task execution. When combined with trajectory data collected from the constructed environments, this mechanism integrates seamlessly with standard RL algorithms to facilitate feedback-driven model training. Experiments on LLMs of varying scales demonstrate that our approach significantly enhances the models' tool-use performance without degrading their general capabilities, regardless of inference modes or training algorithms. Our analysis suggests that these gains result from improved context understanding and reasoning, driven by updates to the lower-layer MLP parameters in models.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2508.08791

Country:

Europe (1.00)
North America > United States (0.68)
Asia > Middle East > UAE (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Education (0.54)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Entry Barriers in Content Markets

Zhu, Haiqing, Xie, Lexing, Cheung, Yun Kuen

arXiv.org Artificial IntelligenceSep-3-2025

The prevalence of low-quality content on online platforms is often attributed to the absence of meaningful entry requirements. This motivates us to investigate whether implicit or explicit entry barriers, alongside appropriate reward mechanisms, can enhance content quality. We present the first game-theoretic analysis of two distinct types of entry barriers in online content platforms. The first, a structural barrier, emerges from the collective behaviour of incumbent content providers which disadvantages new entrants. We show that both rank-order and proportional-share reward mechanisms induce such a structural barrier at Nash equilibrium. The second, a strategic barrier, involves the platform proactively imposing entry fees to discourage participation from low-quality contributors. We consider a scheme in which the platform redirects some or all of the entry fees into the reward pool. We formally demonstrate that this approach can improve overall content quality. Our findings establish a theoretical foundation for designing reward mechanisms coupled with entry fees to promote higher-quality content and support healthier online ecosystems.

artificial intelligence, machine learning, mechanism, (19 more...)

arXiv.org Artificial Intelligence

2509.01953

Genre: Research Report > New Finding (0.48)

Industry: Leisure & Entertainment > Games (0.34)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.46)

Add feedback

A Mechanism for Mutual Fairness in Cooperative Games with Replicable Resources -- Extended Version

Filter, Björn, Möller, Ralf, Özçep, Özgür Lütfü

arXiv.org Artificial IntelligenceAug-20-2025

The latest developments in AI focus on agentic systems where artificial and human agents cooperate to realize global goals. An example is collaborative learning, which aims to train a global model based on data from individual agents. A major challenge in designing such systems is to guarantee safety and alignment with human values, particularly a fair distribution of rewards upon achieving the global goal. Cooperative game theory offers useful abstractions of cooperating agents via value functions, which assign value to each coalition, and via reward functions. With these, the idea of fair allocation can be formalized by specifying fairness axioms and designing concrete mechanisms. Classical cooperative game theory, exemplified by the Shapley value, does not fully capture scenarios like collaborative learning, as it assumes nonreplicable resources, whereas data and models can be replicated. Infinite replicability requires a generalized notion of fairness, formalized through new axioms and mechanisms. These must address imbalances in reciprocal benefits among participants, which can lead to strategic exploitation and unfair allocations. The main contribution of this paper is a mechanism and a proof that it fulfills the property of mutual fairness, formalized by the Balanced Reciprocity Axiom. It ensures that, for every pair of players, each benefits equally from the participation of the other.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2508.1396

Country: Europe (0.28)

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment > Games (0.55)
Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Communications > Collaboration (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
(2 more...)

Add feedback

MemoCue: Empowering LLM-Based Agents for Human Memory Recall via Strategy-Guided Querying

Zhao, Qian, Sun, Zhuo, Guo, Bin, Yu, Zhiwen

arXiv.org Artificial IntelligenceAug-1-2025

Agent-assisted memory recall is one critical research problem in the field of human-computer interaction. In conventional methods, the agent can retrieve information from its equipped memory module to help the person recall incomplete or vague memories. The limited size of memory module hinders the acquisition of complete memories and impacts the memory recall performance in practice. Memory theories suggest that the person's relevant memory can be proactively activated through some effective cues. Inspired by this, we propose a novel strategy-guided agent-assisted memory recall method, allowing the agent to transform an original query into a cue-rich one via the judiciously designed strategy to help the person recall memories. To this end, there are two key challenges. (1) How to choose the appropriate recall strategy for diverse forgetting scenarios with distinct memory-recall characteristics? (2) How to obtain the high-quality responses leveraging recall strategies, given only abstract and sparsely annotated strategy patterns? To address the challenges, we propose a Recall Router framework. Specifically, we design a 5W Recall Map to classify memory queries into five typical scenarios and define fifteen recall strategy patterns across the corresponding scenarios. We then propose a hierarchical recall tree combined with the Monte Carlo Tree Search algorithm to optimize the selection of strategy and the generation of strategy responses. We construct an instruction tuning dataset and fine-tune multiple open-source large language models (LLMs) to develop MemoCue, an agent that excels in providing memory-inspired responses. Experiments on three representative datasets show that MemoCue surpasses LLM-based methods by 17.74% in recall inspiration. Further human evaluation highlights its advantages in memory-recall applications.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2507.23633

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

GUI-G$^2$: Gaussian Reward Modeling for GUI Grounding

Tang, Fei, Gu, Zhangxuan, Lu, Zhengxi, Liu, Xuyang, Shen, Shuheng, Meng, Changhua, Wang, Wen, Zhang, Wenqi, Shen, Yongliang, Lu, Weiming, Xiao, Jun, Zhuang, Yueting

arXiv.org Artificial IntelligenceJul-29-2025

Graphical User Interface (GUI) grounding maps natural language instructions to precise interface locations for autonomous interaction. Current reinforcement learning approaches use binary rewards that treat elements as hit-or-miss targets, creating sparse signals that ignore the continuous nature of spatial interactions. Motivated by human clicking behavior that naturally forms Gaussian distributions centered on target elements, we introduce GUI Gaussian Grounding Rewards (GUI-G$^2$), a principled reward framework that models GUI elements as continuous Gaussian distributions across the interface plane. GUI-G$^2$ incorporates two synergistic mechanisms: Gaussian point rewards model precise localization through exponentially decaying distributions centered on element centroids, while coverage rewards assess spatial alignment by measuring the overlap between predicted Gaussian distributions and target regions. To handle diverse element scales, we develop an adaptive variance mechanism that calibrates reward distributions based on element dimensions. This framework transforms GUI grounding from sparse binary classification to dense continuous optimization, where Gaussian distributions generate rich gradient signals that guide models toward optimal interaction positions. Extensive experiments across ScreenSpot, ScreenSpot-v2, and ScreenSpot-Pro benchmarks demonstrate that GUI-G$^2$, substantially outperforms state-of-the-art method UI-TARS-72B, with the most significant improvement of 24.7% on ScreenSpot-Pro. Our analysis reveals that continuous modeling provides superior robustness to interface variations and enhanced generalization to unseen layouts, establishing a new paradigm for spatial reasoning in GUI interaction tasks.

large language model, machine learning, reinforcement learning, (20 more...)

arXiv.org Artificial Intelligence

2507.15846

Genre: Research Report > Promising Solution (0.48)

Technology:

Information Technology > Graphics (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback