A Touch, Vision, and Language Dataset for Multimodal Alignment

Open in new window