Estimating and Incentivizing Imperfect-Knowledge Agents with Hidden Rewards
Dogan, Ilgin, Shen, Zuo-Jun Max, Aswani, Anil
–arXiv.org Artificial Intelligence
Repeated principal-agent theory is a well-established paradigm that studies sequential interactions between two self-interested decision-makers. In particular, it offers a framework to analyze the problem of a primary party (i.e., principal) in a system who seeks to optimize the ultimate performance of the system by repeatedly delegating some operational control to another strategic party (i.e., agent) with a private decision-making process. This privacy imposes an information asymmetry between the principal and the agent that can appear as either an adverse selection setting, in which the information about the agent's true preferences or rewards are hidden from the principal, or a moral hazard setting, in which the actions chosen by the agent are hidden from the principal (Bolton and Dewatripont 2004). In either case, the principal's problem can be defined along two main dimensions: i) learning some private information about the agent by training a consistent estimator, ii) designing an incentive mechanism to lead the agent's algorithm in favor of the principal. In this paper, we study these two research problems for an unexplored adverse selection setting by marrying the classical principal-agent theory to statistics and reinforcement learning. In a repeated principal-agent game, the main theoretical challenge is sourced from the dynamic and sequential interactions taking place between the two strategic decision-makers. In each play of the game, first the principal offers a menu of incentives to the agent, and then the agent makes a choice from a finite set of actions, which in turn determines the rewards collected by both players. In other words, there is a two-sided sequential externality in this setting, whereby the agent's imperfect knowledge imposes additional costs on the principal and the principal's incentives impose a more challenging decision-making environment for the agent with imperfect knowledge. This paper considers that both the principal and the agent observe stochastic rewards with unknown (to both) expectations, and that both parties aim to maximize their own cumulative expected rewards Dogan et.
arXiv.org Artificial Intelligence
Aug-13-2023
- Country:
- North America > United States > California > Alameda County > Berkeley (0.14)
- Genre:
- Research Report > New Finding (0.45)
- Industry:
- Banking & Finance (0.67)
- Electrical Industrial Apparatus (1.00)
- Energy
- Energy Storage (1.00)
- Power Industry > Utilities (0.46)
- Renewable (1.00)
- Health & Medicine > Therapeutic Area (0.67)
- Technology: