opponent
- North America > United States > Illinois > Cook County > Evanston (0.04)
- North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
- North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)
- (2 more...)
- Leisure & Entertainment > Sports (1.00)
- Leisure & Entertainment > Games > Computer Games (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
- (3 more...)
- North America > United States > Texas (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Virginia > Arlington County > Arlington (0.04)
- (5 more...)
- Europe > United Kingdom (0.04)
- Africa > Côte d'Ivoire (0.04)
- North America (0.04)
- Africa > Nigeria > Lagos State > Lagos (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- (2 more...)
7 Checklist
For all authors... (a) Do the main claims made in the abstract and introduction accurately reflect the paper's contributions and scope? If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Y es] We release the code and the models If you used crowdsourcing or conducted research with human subjects... (a) Did you include the full text of instructions given to participants and screenshots, if applicable? [Y es] We included the instructions given to participants in appendix F. In this appendix, we describe the neural network architecture used for our agents.Figure 2: Transformer encoder (left) used in both policy proposal network (center) and value network (right). Our model architecture is shown in Figure 2. It is essentially identical to the architecture in [11], except that it replaces the specialized graph-convolution-based encoder with a much simpler transformer encoder, removes all dropout layers, and uses separate policy and value networks. Aside from the encoder, the other aspects of the architecture are the same, notably the LSTM policy decoder, which decodes orders through sequential attention over each successive location in the encoder output to produce an action. The input to our new encoder is also identical to that of [11], consisting of the same representation of the current board state, previous board state, and a recent order embedding. Rather than processing various parts of this input in two parallel trunks before combining them into a shared encoder trunk, we take the simpler approach of concatenating all features together at the start, resulting in 146 feature channels across each of 81 board locations (75 region + 6 coasts). We pass this through a linear layer, add pointwise a learnable per-position per-channel bias, and then pass this to a standard transformer encoder architecture.
- Asia > China > Shanghai > Shanghai (0.04)
- North America > United States > California > Santa Clara County > Stanford (0.04)
Mimicking To Dominate: Imitation Learning Strategies for Success in Multiagent Games
Training agents in multi-agent games presents significant challenges due to their intricate nature. These challenges are exacerbated by dynamics influenced not only by the environment but also by strategies of opponents. Existing methods often struggle with slow convergence and instability.To address these challenges, we harness the potential of imitation learning (IL) to comprehend and anticipate actions of the opponents, aiming to mitigate uncertainties with respect to the game dynamics.Our key contributions include:(i) a new multi-agent IL model for predicting next moves of the opponents - our model works with hidden actions of opponents and local observations;(ii) a new multi-agent reinforcement learning (MARL) algorithm that combines our IL model and policy training into one single training process;and (iii) extensive experiments in three challenging game environments, including an advanced version of the Star-Craft multi-agent challenge (i.e., SMACv2).Experimental results show that our approach achieves superior performance compared to state-of-the-art MARL algorithms.