Appendices A Benchmark Details Dataset Primitive Task Set