An End-to-End Human Simulator for Task-Oriented Multimodal Human-Robot Collaboration