It is the focus of this work to extend and study the previously proposed quantum-like Bayesian networks to deal with decision-making scenarios by incorporating the notion of maximum expected utility in influence diagrams. The general idea is to take advantage of the quantum interference terms produced in the quantum-like Bayesian Network to influence the probabilities used to compute the expected utility of some action. This way, we are not proposing a new type of expected utility hypothesis. On the contrary, we are keeping it under its classical definition. We are only incorporating it as an extension of a probabilistic graphical model in a compact graphical representation called an influence diagram in which the utility function depends on the probabilistic influences of the quantum-like Bayesian network. Our findings suggest that the proposed quantum-like influence digram can indeed take advantage of the quantum interference effects of quantum-like Bayesian Networks to maximise the utility of a cooperative behaviour in detriment of a fully rational defect behaviour under the prisoner's dilemma game.
Algorithmic collusion is an emerging concept in current artificial intelligence age. Whether algorithmic collusion is a creditable threat remains as an argument. In this paper, we propose an algorithm which can extort its human rival to collude in a Cournot duopoly competing market. In experiments, we show that, the algorithm can successfully extorted its human rival and gets higher profit in long run, meanwhile the human rival will fully collude with the algorithm. As a result, the social welfare declines rapidly and stably. Both in theory and in experiment, our work confirms that, algorithmic collusion can be a creditable threat. In application, we hope, the frameworks, the algorithm design as well as the experiment environment illustrated in this work, can be an incubator or a test bed for researchers and policymakers to handle the emerging algorithmic collusion.
Standard results on and algorithms for repeated games assume that defections are instantly observable. In reality, it may take some time for the knowledge that a defection has occurred to propagate through the social network. How does this affect the structure of equilibria and algorithms for computing them? In this paper, we consider games with cooperation and defection. We prove that there exists a unique maximal set of forever-cooperating agents in equilibrium and give an efficient algorithm for computing it. We then evaluate this algorithm on random graphs and find experimentally that there appears to be a phase transition between cooperation everywhere and defection everywhere, based on the value of cooperation and the discount factor. Finally, we provide a condition for when the equilibrium found is credible, in the sense that agents are in fact motivated to punish deviating agents. We find that this condition always holds in our experiments, provided the graphs are sufficiently large.
In this paper we address the question of assigning social norms to agents: should we attempt to ascribe social norms to agents that will act in complex dynamic environments, or is it possible to allow the agents to adapt to new situations as they arise, and choose their norms accordingly? We argue that adaptation is preferable to prescription, in that agents should be allowed to revise their norms on the fly. A system is constructed in which the performance of multiple agents operating in the same environment can be assessed. Experimental results concerning alternative norm selection strategies are presented and discussed.
We consider a repeated Prisoner's Dilemma game where two independent learning agents play against each other. We assume that the players can observe each others' action but are oblivious to the payoff received by the other player. Multiagent learning literature has provided mechanisms that allow agents to converge to Nash Equilibrium. In this paper we define a special class of learner called a conditional joint action learner (CJAL) who attempts to learn the conditional probability of an action taken by the other given its own action and uses it to decide its next course of action. We prove that when played against itself, if the payoff structure of Prisoner's Dilemma game satisfies certain conditions, using a limited exploration technique these agents can actually learn to converge to the Pareto optimal solution that dominates the Nash Equilibrium, while maintaining individual rationality. We analytically derive the conditions for which such a phenomenon can occur and have shown experimental results to support our claim.