A The Overall Workflow of EXPAND
–Neural Information Processing Systems
Algorithm 1: Train - Interaction Loop Result: Trained Eff. In EXPAND, we augment each human evaluated state to 5 states. To verify 5 is sufficient, we also experimented with the numbers of augmentations required in each state to get the best performance. Figure 7 shows a comparison when the number of augmentations is varied among {1, 5, 12} for Pixel Taxi and Pong using a synthetic oracle. The plots suggest that increasing augmentations only evoke slight performance gains, and therefore setting the number of perturbations to 5 Figure 7: Learning curves of the variants of EXPAND with for EXPAND is apt.
Neural Information Processing Systems
Feb-10-2025, 09:57:16 GMT