Learning Relationships between Text, Audio, and Video via Deep Canonical Correlation for Multimodal Language Analysis

Open in new window