CMMA: Benchmarking Multi-Affection Detection in Chinese Multi-Modal Conversations Supplementary Document
–Neural Information Processing Systems
However, the datasets they worked on, such as MELD [1], IEMOCAP [2], UR-FUNNY [3], MUStARD [4], etc., have annotations on solely one or two types of affection, and inter-relatedness between tasks is absent. Without an explicit annotation of cross-task correlations, the potential of multi-modal multi-affection joint detection could not be fully explored, neither deepen the understanding on human complicated affections. We fill the gap by constructing a large-scale benchmark multi-modal multi-affection conversational dataset. We manage to tackle the following main challenges for building such a dataset: (1) multiaffection joint judgment: the subjectivity and creativity of human language make it hard to judge different affections at the same time accurately; (2) multi-affection correlation: different affections can be indistinguishable at certain circumstances, and it is difficult to accurately measure their relatedness; (3) context effect: an utterance may express different affections in different conversational contexts. Each utterance is annotated with sentiment (including pride and romantic love), emotion, sarcasm and humor labels, accompanied by sentiment-emotion and sarcasm-humor inter-relatedness measures. Considering that the external knowledge implicitly influences the speaker's affective state, the speaker's background (i.e., name, profession, sex, personality) and the topic of each conversation are provided, an example as illustrated in Figure 1. Each utterance contains textual, visual and acoustic information, which are stored in.CSV,.MP4,.WAV files. We have also collected real dialogue samples from the 10086 customer service of China Mobile Communication Group Tianjin Co., but due to the protection of user privacy and company regulations, we cannot publicly disclose these samples. Metropolitan opera Romance, Idol " 都挺好 " (All Is Well) Crime thriller Crime " 心理罪 " (Guilty of Mind) The statistics of the TV are shown in Table 1. We could notice that our domain includes six comedy shows, four metropolitan opera shows, five dramas and four thrillers, which is well-proportioned (6:4:5:4). Moreover, such TV shows cover various styles, e.g., costume, idol, romance, war, family, history, crime, which provide numerous conversation topics. Both actions will ensure us to collect balanced sentiment, emotion, sarcasm, humor, pride and love labels in the best possible way. We argue that the speaker's information is also collected.
Neural Information Processing Systems
Mar-21-2025, 11:43:13 GMT
- Country:
- Asia > China > Tianjin Province > Tianjin (0.24)
- Industry:
- Information Technology > Security & Privacy (0.93)
- Technology:
- Information Technology
- Artificial Intelligence
- Machine Learning > Neural Networks
- Deep Learning (0.69)
- Natural Language (1.00)
- Machine Learning > Neural Networks
- Communications > Social Media (0.69)
- Data Science > Data Quality (0.68)
- Security & Privacy (0.93)
- Artificial Intelligence
- Information Technology