LLMs as Agentic Cooperative Players in Multiplayer UNO

Matinez, Yago Romano, Roberts, Jesse

arXiv.org Artificial Intelligence 

Third, the current game state data--number of players, last played card, hand contents, next player, recent moves, and legal actions. Finally, the LLM was asked to choose the best action according to the specified prompting method. The game state information was extracted from RLCard and reformatted for readability. While RLCard encodes cards using shorthand (e.g., "r-5" for red 5), we expanded these into full descriptions to improve the model's comprehension. An example of the complete prompt format is shown in Figure 3. To drive the model's action selection, we applied two prompting strategies inspired by Moore et al. [17]: cloze prompting and counterfactual prompting. These methods determine how the model interprets the prompt and evaluates its legal actions during gameplay. Cloze Prompting: In this method, legal actions were labeled with sequential letters (A, B, C, etc.), and the LLM was instructed to choose the letter corresponding to the best move. Only one token was allowed in the output, and the highest-probability token from the set of allowable actions was selected as the action.