Appendix A Additional Related Work

Neural Information Processing Systems 

Utilizing global information to reduce the complexity of imperfect-information games has also been investigated in some works. In their implementation, the value network of the agent can observe the full information about the game state, including those that are hidden from the policy. They argue that such a training style improves training performance. Moreover, in Suphx [15], a strong Mahjong AI system, they used a similar method namely oracle guiding. Particularly, in the beginning of the training stage, all global information is utilized; then, as the training goes, the additional information would be dropped out slowly to none, and only the information that the agent is allowed to observe is reserved in the subsequent training stage.