A The Overall Workflow of EXPAND

Neural Information Processing Systems 

Algorithm 1: Train - Interaction Loop Result: Trained Eff. To verify 5 is sufficient, we also experimented with the numbers of augmentations required in each state to get the best performance. The network architectures are shown in Figure 1. The Eff. DQN is then jointly trained with standard DQN loss, feedback loss (advantage loss), and the Note that the weight of explanation loss is set to 0.1 as suggested in previous works Gaussian filters, which can be more efficient with respect to wall-clock time. We evaluated EXP AND against the baselines using an oracle.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found