to

### Human-Agent Cooperation in Bridge Bidding

We introduce a human-compatible reinforcement-learning approach to a cooperative game, making use of a third-party hand-coded human-compatible bot to generate initial training data and to perform initial evaluation. Our learning approach consists of imitation learning, search, and policy iteration. Our trained agents achieve a new state-of-the-art for bridge bidding in three settings: an agent playing in partnership with a copy of itself; an agent partnering a pre-existing bot; and an agent partnering a human player.

### Joint Policy Search for Multi-agent Collaboration with Imperfect Information

To learn good joint policies for multi-agent collaboration with imperfect information remains a fundamental challenge. While for two-player zero-sum games, coordinate-ascent approaches (optimizing one agent's policy at a time, e.g., self-play) work with guarantees, in multi-agent cooperative setting they often converge to sub-optimal Nash equilibrium. On the other hand, directly modeling joint policy changes in imperfect information game is nontrivial due to complicated interplay of policies (e.g., upstream updates affect downstream state reachability). In this paper, we show global changes of game values can be decomposed to policy changes localized at each information set, with a novel term named policy-change density. Based on this, we propose Joint Policy Search(JPS) that iteratively improves joint policies of collaborative agents in imperfect information games, without re-evaluating the entire game. On multi-agent collaborative tabular games, JPS is proven to never worsen performance and can improve solutions provided by unilateral approaches (e.g, CFR), outperforming algorithms designed for collaborative policy learning (e.g. BAD). Furthermore, for real-world games, JPS has an online form that naturally links with gradient updates. We test it to Contract Bridge, a 4-player imperfect-information game where a team of $2$ collaborates to compete against the other. In its bidding phase, players bid in turn to find a good contract through a limited information channel. Based on a strong baseline agent that bids competitive bridge purely through domain-agnostic self-play, JPS improves collaboration of team players and outperforms WBridge5, a championship-winning software, by $+0.63$ IMPs (International Matching Points) per board over 1k games, substantially better than previous SoTA ($+0.41$ IMPs/b) under Double-Dummy evaluation.

### Construction and Elicitation of a Black Box Model in the Game of Bridge

Our goal is to model expert decision processes in Bridge. To do so, we propose a methodology involving human experts, black box decision programs, and relational supervised machine learning systems. The aim is to obtain a global model for this decision process, that is both expressive and has high predictive performance. Following the success of supervised methods of the deep network family, and a growing pressure from society imposing that automated decision processes be made more transparent, a growing number of AI researchers are (re)exploring techniques to interpret, justify, or explain "black box" classifiers (referred to as the Black Box Outcome Explanation Problem [Guidotti et al., 2019]). It is a question of building, a posteriori, explicit models in symbolic languages, most often in the form of rules or deci-Daniel Braun, Colin Deheeger, Jean Pierre Desmoulins, Jean Baptiste Fantun, Swann Legras, Alexis Rimbaud, Céline Rouveirol, Henry Soldano and Véronique Ventos NukkAI, Paris, France Henry Soldano and Céline Rouveirol Université Sorbonne Paris-Nord, L.I.P.N UMR-CNRS 7030 Villetaneuse, France

### StarAI: Reducing incompleteness in the game of Bridge using PLP

Bridge is a trick-taking card game requiring the ability to evaluate probabilities since it is a game of incomplete information where each player only sees its cards. In order to choose a strategy, a player needs to gather information about the hidden cards in the other players' hand. We present a methodology allowing us to model a part of card playing in Bridge using Probabilistic Logic Programming.

### The {\alpha}{\mu} Search Algorithm for the Game of Bridge

{\alpha}{\mu} is an anytime heuristic search algorithm for incomplete information games that assumes perfect information for the opponents. {\alpha}{\mu} addresses the strategy fusion and non-locality problems encountered by Perfect Information Monte Carlo sampling. In this paper {\alpha}{\mu} is applied to the game of Bridge.

### Policy Based Inference in Trick-Taking Card Games

Trick-taking card games feature a large amount of private information that slowly gets revealed through a long sequence of actions. This makes the number of histories exponentially large in the action sequence length, as well as creating extremely large information sets. As a result, these games become too large to solve. To deal with these issues many algorithms employ inference, the estimation of the probability of states within an information set. In this paper, we demonstrate a Policy Based Inference (PI) algorithm that uses player modelling to infer the probability we are in a given state. We perform experiments in the German trick-taking card game Skat, in which we show that this method vastly improves the inference as compared to previous work, and increases the performance of the state-of-the-art Skat AI system Kermit when it is employed into its determinized search algorithm.

### Improving Search with Supervised Learning in Trick-Based Card Games

In trick-taking card games, a two-step process of state sampling and evaluation is widely used to approximate move values. While the evaluation component is vital, the accuracy of move value estimates is also fundamentally linked to how well the sampling distribution corresponds the true distribution. Despite this, recent work in trick-taking card game AI has mainly focused on improving evaluation algorithms with limited work on improving sampling. In this paper, we focus on the effect of sampling on the strength of a player and propose a novel method of sampling more realistic states given move history. In particular, we use predictions about locations of individual cards made by a deep neural network --- trained on data from human gameplay - in order to sample likely worlds for evaluation. This technique, used in conjunction with Perfect Information Monte Carlo (PIMC) search, provides a substantial increase in cardplay strength in the popular trick-taking card game of Skat.

### Competitive Bridge Bidding with Deep Neural Networks

The game of bridge consists of two stages: bidding and playing. While playing is proved to be relatively easy for computer programs, bidding is very challenging. During the bidding stage, each player knowing only his/her own cards needs to exchange information with his/her partner and interfere with opponents at the same time. Existing methods for solving perfect-information games cannot be directly applied to bidding. Most bridge programs are based on human-designed rules, which, however, cannot cover all situations and are usually ambiguous and even conflicting with each other. In this paper, we, for the first time, propose a competitive bidding system based on deep learning techniques, which exhibits two novelties. First, we design a compact representation to encode the private and public information available to a player for bidding. Second, based on the analysis of the impact of other players' unknown cards on one's final rewards, we design two neural networks to deal with imperfect information, the first one inferring the cards of the partner and the second one taking the outputs of the first one as part of its input to select a bid. Experimental results show that our bidding system outperforms the top rule-based program.

### Learning Multi-agent Implicit Communication Through Actions: A Case Study in Contract Bridge, a Collaborative Imperfect-Information Game

In situations where explicit communication is limited, a human collaborator is typically able to learn to: (i) infer the meaning behind their partner's actions and (ii) balance between taking actions that are exploitative given their current understanding of the state vs. those that can convey private information about the state to their partner. The first component of this learning process has been well-studied in multi-agent systems, whereas the second --- which is equally crucial for a successful collaboration --- has not. In this work, we complete the learning process and introduce our novel algorithm, Policy-Belief-Iteration ("P-BIT"), which mimics both components mentioned above. A belief module models the other agent's private information by observing their actions, whilst a policy module makes use of the inferred private information to return a distribution over actions. They are mutually reinforced with an EM-like algorithm. We use a novel auxiliary reward to encourage information exchange by actions. We evaluate our approach on the non-competitive bidding problem from contract bridge and show that by self-play agents are able to effectively collaborate with implicit communication, and P-BIT outperforms several meaningful baselines that have been considered.

### Computer Bridge

A computer program that uses AI planning techniques is now the world champion computer program in the game of Contract Bridge. As reported in The New York Times and The Washington Post, this program--a new version of Great Game Products' The classical approach used in AI programs for games of strategy is to do a game tree search using the well-known minimax formula (eq. 1) The minimax computation is basically a bruteforce search: If implemented as formulated here, it would examine every node in the game tree. In practical implementations of minimax game tree searching, a number of techniques are used to improve the efficiency of this computation: putting a bound on the depth of the search, using alpha-beta pruning, doing transposition-table lookup, and so on. However, even with enhancements such as these, minimax computations often involve examining huge numbers of nodes in the game tree. Because a Bridge hand is typically played in just a few minutes, there is not enough time for a game tree search to search enough of this tree to make good decisions.