Reviews: M-Walk: Learning to Walk over Graphs using Monte Carlo Tree Search

Neural Information Processing Systems 

This work tackles the problem of knowledge base completion (KBC) by walking through a knowledge graph. The graph walk is treated as an MDP with a known state transition model. MCTS is used to plan a trajectory through the graph, and these trajectories are used to train a value function with Q-learning. The Q-function is used to form a policy prior for the MCTS. A KBC-specific RNN structure is also proposed.