Ma, Wenjun (South China Normal University) | Jiang, Yuncheng (South China Normal University) | Liu, Weiru (University of Bristol) | Luo, Xudong (Guangxi Normal University) | McAreavey, Kevin (University of Bristol)
Some well-known paradoxes in decision making (e.g., the Allais paradox, the St. Peterburg paradox, the Ellsberg paradox, and the Machina paradox) reveal that choices conventional expected utility theory predicts could be inconsistent with empirical observations. So, solutions to these paradoxes can help us better understand humans decision making accurately. This is also highly related to the prediction power of a decision-making model in real-world applications. Thus, various models have been proposed to address these paradoxes. However, most of them can only solve parts of the paradoxes, and for doing so some of them have to rely on the parameter tuning without proper justifications for such bounds of parameters. To this end, this paper proposes a new descriptive decision-making model, expected utility with relative loss reduction, which can exhibit the same qualitative behaviours as those observed in experiments of these paradoxes without any additional parameter setting. In particular, we introduce the concept of relative loss reduction to reflect people's tendency to prefer ensuring a sufficient minimum loss to just a maximum expected utility in decision-making under risk or ambiguity.
We consider the problem of belief aggregation: given a group of individual agents with probabilistic beliefs over a set of uncertain events, formulate a sensible consensus or aggregate probability distribution over these events. Researchers have proposed many aggregation methods, although on the question of which is best the general consensus is that there is no consensus. We develop a market-based approach to this problem, where agents bet on uncertain events by buying or selling securities contingent on their outcomes. Each agent acts in the market so as to maximize expected utility at given securities prices, limited in its activity only by its own risk aversion. The equilibrium prices of goods in this market represent aggregate beliefs. For agents with constant risk aversion, we demonstrate that the aggregate probability exhibits several desirable properties, and is related to independently motivated techniques. We argue that the market-based approach provides a plausible mechanism for belief aggregation in multiagent systems, as it directly addresses self-motivated agent incentives for participation and for truthfulness, and can provide a decision-theoretic foundation for the "expert weights" often employed in centralized pooling techniques.
Most humans have an innate aversion to harming one another, even a total stranger. But where does that reluctance come from? After all, shouldn't the pitiless, dog-eat-dog forces of natural selection have hard-wired us to have the opposite tendency? A recent study on harm aversion among rats suggests that a basic unwillingness to hurt a member of one's own species emerged among mammals more than 80 million years ago. In a study published last week, experimenters found that some rats had a tendency to avoid pushing a lever that delivered a sucrose pellet if doing so caused a rat in a neighboring cage to receive an electric shock.
Autonomous systems can substantially enhance a human's efficiency and effectiveness in complex environments. Machines, however, are often unable to observe the preferences of the humans that they serve. Despite the fact that the human's and machine's objectives are aligned, asymmetric information, along with heterogeneous sensitivities to risk by the human and machine, make their joint optimization process a game with strategic interactions. We propose a framework based on risk-sensitive dynamic games; the human seeks to optimize her risk-sensitive criterion according to her true preferences, while the machine seeks to adaptively learn the human's preferences and at the same time provide a good service to the human. We develop a class of performance measures for the proposed framework based on the concept of regret. We then evaluate their dependence on the risk-sensitivity and the degree of uncertainty. We present applications of our framework to self-driving taxis, and robo-financial advising.
Hughes, Edward, Leibo, Joel Z., Phillips, Matthew G., Tuyls, Karl, Duéñez-Guzmán, Edgar A., Castañeda, Antonio García, Dunning, Iain, Zhu, Tina, McKee, Kevin R., Koster, Raphael, Roff, Heather, Graepel, Thore
Groups of humans are often able to find ways to cooperate with one another in complex, temporally extended social dilemmas. Models based on behavioral economics are only able to explain this phenomenon for unrealistic stateless matrix games. Recently, multi-agent reinforcement learning has been applied to generalize social dilemma problems to temporally and spatially extended Markov games. However, this has not yet generated an agent that learns to cooperate in social dilemmas as humans do. A key insight is that many, but not all, human individuals have inequity averse social preferences. This promotes a particular resolution of the matrix game social dilemma wherein inequity-averse individuals are personally pro-social and punish defectors. Here we extend this idea to Markov games and show that it promotes cooperation in several types of sequential social dilemma, via a profitable interaction with policy learnability. In particular, we find that inequity aversion improves temporal credit assignment for the important class of intertemporal social dilemmas. These results help explain how large-scale cooperation may emerge and persist.