Goto

Collaborating Authors

 Inductive Learning









cf67355a3333e6e143439161adc2d82e-AuthorFeedback.pdf

Neural Information Processing Systems

We thank all the reviewers for their valuable suggestions. Our response to individual reviewers' concerns are as follows. The scope of the two papers is different. The usage of AU relationship is different. The performance of our semi-supervised learning method is lower than [1] on BP4D.



Multimodal Fusion with Semi-Supervised Learning Minimizes Annotation Quantity for Modeling Videoconference Conversation Experience

arXiv.org Artificial Intelligence

Group conversations over videoconferencing are a complex social behavior. However, the subjective moments of negative experience, where the conversation loses fluidity or enjoyment remain understudied. These moments are infrequent in naturalistic data, and thus training a supervised learning (SL) model requires costly manual data annotation. We applied semi-supervised learning (SSL) to leverage targeted labeled and unlabeled clips for training multimodal (audio, facial, text) deep features to predict non-fluid or unenjoyable moments in holdout videoconference sessions. The modality-fused co-training SSL achieved an ROC-AUC of 0.9 and an F1 score of 0.6, outperforming SL models by up to 4% with the same amount of labeled data. Remarkably, the best SSL model with just 8% labeled data matched 96% of the SL model's full-data performance. This shows an annotation-efficient framework for modeling videoconference experience.