Robust Understanding of Human-Robot Social Interactions through Multimodal Distillation