Collective Behavior Clone with Visual Attention via Neural Interaction Graph Prediction