Estimating and Incentivizing Imperfect-Knowledge Agents with Hidden Rewards

Dogan, Ilgin, Shen, Zuo-Jun Max, Aswani, Anil

Aug-13-2023–arXiv.org Artificial Intelligence

Repeated principal-agent theory is a well-established paradigm that studies sequential interactions between two self-interested decision-makers. In particular, it offers a framework to analyze the problem of a primary party (i.e., principal) in a system who seeks to optimize the ultimate performance of the system by repeatedly delegating some operational control to another strategic party (i.e., agent) with a private decision-making process. This privacy imposes an information asymmetry between the principal and the agent that can appear as either an adverse selection setting, in which the information about the agent's true preferences or rewards are hidden from the principal, or a moral hazard setting, in which the actions chosen by the agent are hidden from the principal (Bolton and Dewatripont 2004). In either case, the principal's problem can be defined along two main dimensions: i) learning some private information about the agent by training a consistent estimator, ii) designing an incentive mechanism to lead the agent's algorithm in favor of the principal. In this paper, we study these two research problems for an unexplored adverse selection setting by marrying the classical principal-agent theory to statistics and reinforcement learning. In a repeated principal-agent game, the main theoretical challenge is sourced from the dynamic and sequential interactions taking place between the two strategic decision-makers. In each play of the game, first the principal offers a menu of incentives to the agent, and then the agent makes a choice from a finite set of actions, which in turn determines the rewards collected by both players. In other words, there is a two-sided sequential externality in this setting, whereby the agent's imperfect knowledge imposes additional costs on the principal and the principal's incentives impose a more challenging decision-making environment for the agent with imperfect knowledge. This paper considers that both the principal and the agent observe stochastic rewards with unknown (to both) expectations, and that both parties aim to maximize their own cumulative expected rewards Dogan et.

data mining, machine learning, reinforcement learning, (22 more...)

arXiv.org Artificial Intelligence

Aug-13-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States > California > Alameda County > Berkeley (0.14)

Genre:
- Research Report > New Finding (0.45)

Industry:
- Banking & Finance (0.67)
- Electrical Industrial Apparatus (1.00)
- Energy
  - Energy Storage (1.00)
  - Power Industry > Utilities (0.46)
  - Renewable (1.00)
- Health & Medicine > Therapeutic Area (0.67)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning > Reinforcement Learning (0.34)
    - Representation & Reasoning (1.00)
  - Data Science > Data Mining
    - Big Data (0.46)
  - Game Theory (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found