reward mechanism
Rethinking Incentives in Recommender Systems: Are Monotone Rewards Always Beneficial?
The past decade has witnessed the flourishing of a new profession as media content creators, who rely on revenue streams from online content recommendation platforms. The reward mechanism employed by these platforms creates a competitive environment among creators which affects their production choices and, consequently, content distribution and system welfare. It is thus crucial to design the platform's reward mechanism in order to steer the creators' competition towards a desirable welfare outcome in the long run. This work makes two major contributions in this regard: first, we uncover a fundamental limit about a class of widely adopted mechanisms, coined \emph{Merit-based Monotone Mechanisms}, by showing that they inevitably lead to a constant fraction loss of the optimal welfare. To circumvent this limitation, we introduce \emph{Backward Rewarding Mechanisms} (BRMs) and show that the competition game resultant from BRMs possesses a potential game structure. BRMs thus naturally induce strategic creators' collective behaviors towards optimizing the potential function, which can be designed to match any given welfare metric. In addition, the class of BRM can be parameterized so that it allows the platform to directly optimize welfare within the feasible mechanism space even when the welfare metric is not explicitly defined.
Experience-Driven Exploration for Efficient API-Free AI Agents
Tang, Chenwei, Xing, Jingyu, Liu, Xinyu, Wang, Zizhou, Du, Jiawei, Zhen, Liangli, Lv, Jiancheng
Most existing software lacks accessible Application Programming Interfaces (APIs), requiring agents to operate solely through pixel-based Graphical User Interfaces (GUIs). In this API-free setting, large language model (LLM)-based agents face severe efficiency bottlenecks: limited to local visual experiences, they make myopic decisions and rely on inefficient trial-and-error, hindering both skill acquisition and long-term planning. To address these challenges, we propose KG-Agent, an experience-driven learning framework that structures an agent's raw pixel-level interactions into a persistent State-Action Knowledge Graph (SA-KG). KG-Agent overcomes inefficient exploration by linking functionally similar but visually distinct GUI states, forming a rich neighborhood of experience that enables the agent to generalize from a diverse set of historical strategies. To support long-horizon reasoning, we design a hybrid intrinsic reward mechanism based on the graph topology, combining a state value reward for exploiting known high-value pathways with a novelty reward that encourages targeted exploration. This approach decouples strategic planning from pure discovery, allowing the agent to effectively value setup actions with delayed gratification. We evaluate KG-Agent in two complex, open-ended GUI-based decision-making environments (Civilization V and Slay the Spire), demonstrating significant improvements in exploration efficiency and strategic depth over the state-of-the-art methods.
(P)rior(D)yna(F)low: A Priori Dynamic Workflow Construction via Multi-Agent Collaboration
Lin, Yi, Zhao, Lujin, Shi, Yijie
Recent studies have shown that carefully designed workflows coordinating large language models(LLMs) significantly enhance task-solving capabilities compared to using a single model. While an increasing number of works focus on autonomous workflow construction, most existing approaches rely solely on historical experience, leading to limitations in efficiency and adaptability. We argue that while historical experience is valuable, workflow construction should also flexibly respond to the unique characteristics of each task. To this end, we propose an a priori dynamic framework for automated workflow construction. Our framework first leverages Q-table learning to optimize the decision space, guiding agent decisions and enabling effective use of historical experience. At the same time, agents evaluate the current task progress and make a priori decisions regarding the next executing agent, allowing the system to proactively select the more suitable workflow structure for each given task. Additionally, we incorporate mechanisms such as cold-start initialization, early stopping, and pruning to further improve system efficiency. Experimental evaluations on four benchmark datasets demonstrate the feasibility and effectiveness of our approach. Compared to state-of-the-art baselines, our method achieves an average improvement of 4.05%, while reducing workflow construction and inference costs to only 30.68%-48.31% of those required by existing methods.
- Asia > Thailand > Bangkok > Bangkok (0.04)
- North America > Mexico > Mexico City > Mexico City (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > China > Beijing > Beijing (0.04)
- Workflow (1.00)
- Research Report (1.00)
Feedback-Driven Tool-Use Improvements in Large Language Models via Automated Build Environments
Ye, Junjie, Jiang, Changhao, Du, Zhengyin, Xu, Yufei, Yao, Xuesong, Xi, Zhiheng, Fan, Xiaoran, Zhang, Qi, Gui, Tao, Huang, Xuanjing, Chen, Jiecao
Effective tool use is essential for large language models (LLMs) to interact meaningfully with their environment. However, progress is limited by the lack of efficient reinforcement learning (RL) frameworks specifically designed for tool use, due to challenges in constructing stable training environments and designing verifiable reward mechanisms. To address this, we propose an automated environment construction pipeline, incorporating scenario decomposition, document generation, function integration, complexity scaling, and localized deployment. This enables the creation of high-quality training environments that provide detailed and measurable feedback without relying on external tools. Additionally, we introduce a verifiable reward mechanism that evaluates both the precision of tool use and the completeness of task execution. When combined with trajectory data collected from the constructed environments, this mechanism integrates seamlessly with standard RL algorithms to facilitate feedback-driven model training. Experiments on LLMs of varying scales demonstrate that our approach significantly enhances the models' tool-use performance without degrading their general capabilities, regardless of inference modes or training algorithms. Our analysis suggests that these gains result from improved context understanding and reasoning, driven by updates to the lower-layer MLP parameters in models.
- Europe > Austria > Vienna (0.14)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- Europe > Serbia (0.04)
- (16 more...)
- Education (0.54)
- Government (0.46)
Entry Barriers in Content Markets
Zhu, Haiqing, Xie, Lexing, Cheung, Yun Kuen
The prevalence of low-quality content on online platforms is often attributed to the absence of meaningful entry requirements. This motivates us to investigate whether implicit or explicit entry barriers, alongside appropriate reward mechanisms, can enhance content quality. We present the first game-theoretic analysis of two distinct types of entry barriers in online content platforms. The first, a structural barrier, emerges from the collective behaviour of incumbent content providers which disadvantages new entrants. We show that both rank-order and proportional-share reward mechanisms induce such a structural barrier at Nash equilibrium. The second, a strategic barrier, involves the platform proactively imposing entry fees to discourage participation from low-quality contributors. We consider a scheme in which the platform redirects some or all of the entry fees into the reward pool. We formally demonstrate that this approach can improve overall content quality. Our findings establish a theoretical foundation for designing reward mechanisms coupled with entry fees to promote higher-quality content and support healthier online ecosystems.
- North America > United States > New York > New York County > New York City (0.04)
- Asia > Middle East > Jordan (0.04)
- Africa > Rwanda > Kigali > Kigali (0.04)
A Mechanism for Mutual Fairness in Cooperative Games with Replicable Resources -- Extended Version
Filter, Björn, Möller, Ralf, Özçep, Özgür Lütfü
The latest developments in AI focus on agentic systems where artificial and human agents cooperate to realize global goals. An example is collaborative learning, which aims to train a global model based on data from individual agents. A major challenge in designing such systems is to guarantee safety and alignment with human values, particularly a fair distribution of rewards upon achieving the global goal. Cooperative game theory offers useful abstractions of cooperating agents via value functions, which assign value to each coalition, and via reward functions. With these, the idea of fair allocation can be formalized by specifying fairness axioms and designing concrete mechanisms. Classical cooperative game theory, exemplified by the Shapley value, does not fully capture scenarios like collaborative learning, as it assumes nonreplicable resources, whereas data and models can be replicated. Infinite replicability requires a generalized notion of fairness, formalized through new axioms and mechanisms. These must address imbalances in reciprocal benefits among participants, which can lead to strategic exploitation and unfair allocations. The main contribution of this paper is a mechanism and a proof that it fulfills the property of mutual fairness, formalized by the Balanced Reciprocity Axiom. It ensures that, for every pair of players, each benefits equally from the participation of the other.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Germany > Hamburg (0.04)
- Asia > Middle East > Jordan (0.04)
- Leisure & Entertainment > Games (0.55)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.46)
MemoCue: Empowering LLM-Based Agents for Human Memory Recall via Strategy-Guided Querying
Zhao, Qian, Sun, Zhuo, Guo, Bin, Yu, Zhiwen
Agent-assisted memory recall is one critical research problem in the field of human-computer interaction. In conventional methods, the agent can retrieve information from its equipped memory module to help the person recall incomplete or vague memories. The limited size of memory module hinders the acquisition of complete memories and impacts the memory recall performance in practice. Memory theories suggest that the person's relevant memory can be proactively activated through some effective cues. Inspired by this, we propose a novel strategy-guided agent-assisted memory recall method, allowing the agent to transform an original query into a cue-rich one via the judiciously designed strategy to help the person recall memories. To this end, there are two key challenges. (1) How to choose the appropriate recall strategy for diverse forgetting scenarios with distinct memory-recall characteristics? (2) How to obtain the high-quality responses leveraging recall strategies, given only abstract and sparsely annotated strategy patterns? To address the challenges, we propose a Recall Router framework. Specifically, we design a 5W Recall Map to classify memory queries into five typical scenarios and define fifteen recall strategy patterns across the corresponding scenarios. We then propose a hierarchical recall tree combined with the Monte Carlo Tree Search algorithm to optimize the selection of strategy and the generation of strategy responses. We construct an instruction tuning dataset and fine-tune multiple open-source large language models (LLMs) to develop MemoCue, an agent that excels in providing memory-inspired responses. Experiments on three representative datasets show that MemoCue surpasses LLM-based methods by 17.74% in recall inspiration. Further human evaluation highlights its advantages in memory-recall applications.
GUI-G$^2$: Gaussian Reward Modeling for GUI Grounding
Tang, Fei, Gu, Zhangxuan, Lu, Zhengxi, Liu, Xuyang, Shen, Shuheng, Meng, Changhua, Wang, Wen, Zhang, Wenqi, Shen, Yongliang, Lu, Weiming, Xiao, Jun, Zhuang, Yueting
Graphical User Interface (GUI) grounding maps natural language instructions to precise interface locations for autonomous interaction. Current reinforcement learning approaches use binary rewards that treat elements as hit-or-miss targets, creating sparse signals that ignore the continuous nature of spatial interactions. Motivated by human clicking behavior that naturally forms Gaussian distributions centered on target elements, we introduce GUI Gaussian Grounding Rewards (GUI-G$^2$), a principled reward framework that models GUI elements as continuous Gaussian distributions across the interface plane. GUI-G$^2$ incorporates two synergistic mechanisms: Gaussian point rewards model precise localization through exponentially decaying distributions centered on element centroids, while coverage rewards assess spatial alignment by measuring the overlap between predicted Gaussian distributions and target regions. To handle diverse element scales, we develop an adaptive variance mechanism that calibrates reward distributions based on element dimensions. This framework transforms GUI grounding from sparse binary classification to dense continuous optimization, where Gaussian distributions generate rich gradient signals that guide models toward optimal interaction positions. Extensive experiments across ScreenSpot, ScreenSpot-v2, and ScreenSpot-Pro benchmarks demonstrate that GUI-G$^2$, substantially outperforms state-of-the-art method UI-TARS-72B, with the most significant improvement of 24.7% on ScreenSpot-Pro. Our analysis reveals that continuous modeling provides superior robustness to interface variations and enhanced generalization to unseen layouts, establishing a new paradigm for spatial reasoning in GUI interaction tasks.