Unsupervised Speech Representation Learning for Behavior Modeling using Triplet Enhanced Contextualized Networks

Li, Haoqi, Baucom, Brian, Narayanan, Shrikanth, Georgiou, Panayiotis

arXiv.org Artificial Intelligence 

Human behavior refers to the way humans act and interact in response to a stimulus, internal or external. Understanding human behavior through observational study is one of the core methodologies in fields such as psychology and sociology (Margolin, Oliver, Gordis, O'hearn, Medina, Ghosh and Morland, 1998). Human behaviors encompass rich information: from emotional expression, processing, and regulation to the intricate dynamics of interactions, including the context and knowledge of interlocutors and their thinking and problem-solving intent (Li, Baucom and Georgiou, 2020). Furthermore, the behavioral constructs of interest are often dependent on the domain of interaction (Narayanan and Georgiou, 2013). Hence characterization of human behavior usually requires domain-specific knowledge and adequate windows of observation. Notably, across psychological health science and practice (Bone, Lee, Chaspari, Gibson and Narayanan, 2017) such as couple therapy (Christensen, Atkins, Berns, Wheeler, Baucom and Simpson, 2004), suicide cognition evaluation (Bryan, Rudd, Wertenberger, Etienne, Ray-Sannerud, Morrow, Peterson and Young-McCaughon, 2014) and addiction counseling (Xiao, Imel, Georgiou, Atkins and Narayanan, 2015), this is exemplified in the definition and derivation of a variety of domain-specific behavior constructs (e.g., blame and affect patterns exhibited by partners, suicidal ideation of an individual at risk, and empathy expressed by a therapist in the respective aforementioned domains) to support specific subsequent plan of action. Human speech offers rich information about the mental state and traits of the talkers. Vocal cues, including speech and spoken language as well as nonverbal vocalizations and disfluency patterns, have been shown to be informationally relevant in the context of human behavior (e.g., in marital interaction (Baucom, Atkins, Simpson and Christensen, 2009), in motivational interviewing (Amrhein, Miller, Yahne, Palmer and Fulcher, 2003; Imel, Barco, Brown, Baucom, Baer, Kircher and Atkins, 2014; Miller, Benefield and Tonigan, 1993)). Many automatic computational approaches that support measurement, analysis, and modeling of human behaviors from speech have been investigated in affective computing (Lee and Narayanan, 2005), social signal processing (Vinciarelli, Pantic and Bourlard, 2009) and behavioral signal processing (BSP) (Narayanan and Georgiou, 2013).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found