A Self-Adjusting Fusion Representation Learning Model for Unaligned Text-Audio Sequences

Open in new window