Goto

Collaborating Authors

 Unsupervised or Indirectly Supervised Learning



Multimodal Fusion with Semi-Supervised Learning Minimizes Annotation Quantity for Modeling Videoconference Conversation Experience

arXiv.org Artificial Intelligence

Group conversations over videoconferencing are a complex social behavior. However, the subjective moments of negative experience, where the conversation loses fluidity or enjoyment remain understudied. These moments are infrequent in naturalistic data, and thus training a supervised learning (SL) model requires costly manual data annotation. We applied semi-supervised learning (SSL) to leverage targeted labeled and unlabeled clips for training multimodal (audio, facial, text) deep features to predict non-fluid or unenjoyable moments in holdout videoconference sessions. The modality-fused co-training SSL achieved an ROC-AUC of 0.9 and an F1 score of 0.6, outperforming SL models by up to 4% with the same amount of labeled data. Remarkably, the best SSL model with just 8% labeled data matched 96% of the SL model's full-data performance. This shows an annotation-efficient framework for modeling videoconference experience.





Supplementary Material for Paper 1 " Universal Semi-Supervised Learning " 2

Neural Information Processing Systems

Moreover, we will conduct additional experiments to further evaluate our method in Section C. Furthermore, we provide the standard deviation results that correspond to the main paper in Section D. Finally, we will discuss the limitations and social impact of our method in Section E. VisDA2017 datasets, we set the batch size to 64. Other implementation details are presented below. It contains 3 domains: "Amazon" (A), "DSLR" (D), and "Webcam" (W), and each domain is composed of 31 classes. Shared learning rate decay factor 0.2 # training iteration in which learning rate decay starts 400,000 # training iteration in which consistency coefficient ramp up starts 200,000 Supervised Initial learning rate 0.003 Π-Model [6, 10] Initial learning rate 3 10 CAFA framework, which includes class-sharing data detection and feature adaptation . Here we use PI as the backbone method.



A Semi-Supervised Learning Approach and A New Dataset

Neural Information Processing Systems

While a lot of recent efforts have been made on generalizing pose estimation to novel object instances within the same category, namely category-level 6D pose estimation, it is still restricted in constrained environments given the limited number of annotated data.