A Theoretical appendix

Neural Information Processing Systems 

A.1 Proof of Proposition 1 Recall Proposition 1: Proposition. Let R be a positive reward function on X . R (x) substituted for F ( x) by the reward matching assumption (8). The trajectory balance constraint (13) can be generalized to partial (not complete) trajectories, i.e., The trajectory balance constraint (13) is the special case of this for full trajectories, while the detailed balance constraint (7) is the special case of trajectories wth only one edge. That is, the path that goes "backward, then forward" from The special case of'one step back, two steps forward' paths was used for a graph We train all models with a learning rate of 0.001 ( Generating the test set .