Borrowing Human Senses: Comment-Aware Self-Training for Social Media Multimodal Classification