Not enough data to create a plot.
Try a different view from the menu above.
Picard, Rosalind
Human Detection of Political Speech Deepfakes across Transcripts, Audio, and Video
Groh, Matthew, Sankaranarayanan, Aruna, Singh, Nikhil, Kim, Dong Young, Lippman, Andrew, Picard, Rosalind
Recent advances in technology for hyper-realistic visual effects provoke the concern that deepfake videos of political speeches will soon be visually indistinguishable from authentic video recordings. The conventional wisdom in communication theory predicts people will fall for fake news more often when the same version of a story is presented as a video versus text. We conduct 4 pre-registered randomized experiments with 2,015 participants to evaluate how accurately humans distinguish real political speeches from fabrications across base rates of misinformation, audio sources, and media modalities. We find base rates of misinformation minimally influence discernment and deepfakes with audio produced by the state-of-the-art text-to-speech algorithms are harder to discern than the same deepfakes with voice actor audio. Moreover, we find audio and visual information enables more accurate discernment than text alone: human discernment relies more on how something is said, the audio-visual cues, than what is said, the speech content.
Multipar-T: Multiparty-Transformer for Capturing Contingent Behaviors in Group Conversations
Lee, Dong Won, Kim, Yubin, Picard, Rosalind, Breazeal, Cynthia, Park, Hae Won
As we move closer to real-world AI systems, AI agents must be able to deal with multiparty (group) conversations. Recognizing and interpreting multiparty behaviors is challenging, as the system must recognize individual behavioral cues, deal with the complexity of multiple streams of data from multiple people, and recognize the subtle contingent social exchanges that take place amongst group members. To tackle this challenge, we propose the Multiparty-Transformer (Multipar-T), a transformer model for multiparty behavior modeling. The core component of our proposed approach is the Crossperson Attention, which is specifically designed to detect contingent behavior between pairs of people. We verify the effectiveness of Multipar-T on a publicly available video-based group engagement detection benchmark, where it outperforms state-of-the-art approaches in average F-1 scores by 5.2% and individual class F-1 scores by up to 10.0%. Through qualitative analysis, we show that our Crossperson Attention module is able to discover contingent behavior.
Mixed Effects Random Forests for Personalised Predictions of Clinical Depression Severity
Lewis, Robert A., Ghandeharioun, Asma, Fedor, Szymon, Pedrelli, Paola, Picard, Rosalind, Mischoulon, David
This work demonstrates how mixed effects random forests enable accurate predictions of depression severity using multimodal physiological and digital activity data collected from an 8-week study involving 31 patients with major depressive disorder. We show that mixed effects random forests outperform standard random forests and personal average baselines when predicting clinical Hamilton Depression Rating Scale scores (HDRS_17). Compared to the latter baseline, accuracy is significantly improved for each patient by an average of 0.199-0.276 in terms of mean absolute error (p<0.05). This is noteworthy as these simple baselines frequently outperform machine learning methods in mental health prediction tasks. We suggest that this improved performance results from the ability of the mixed effects random forest to personalise model parameters to individuals in the dataset. However, we find that these improvements pertain exclusively to scenarios where labelled patient data are available to the model at training time. Investigating methods that improve accuracy when generalising to new patients is left as important future work.
Comparing Human and Machine Deepfake Detection with Affective and Holistic Processing
Groh, Matthew, Epstein, Ziv, Firestone, Chaz, Picard, Rosalind
The recent emergence of deepfake videos leads to an important societal question: how can we know if a video that we watch is real or fake? In three online studies with 15,016 participants, we present authentic videos and deepfakes and ask participants to identify which is which. We compare the performance of ordinary participants against the leading computer vision deepfake detection model and find them similarly accurate while making different kinds of mistakes. Together, participants with access to the model's prediction are more accurate than either alone, but inaccurate model predictions often decrease participants' accuracy. We embed randomized experiments and find: incidental anger decreases participants' performance and obstructing holistic visual processing of faces also hinders participants' performance while mostly not affecting the model's. These results suggest that considering emotional influences and harnessing specialized, holistic visual processing of ordinary people could be promising defenses against machine-manipulated media.
Hierarchical Reinforcement Learning for Open-Domain Dialog
Saleh, Abdelrhman, Jaques, Natasha, Ghandeharioun, Asma, Shen, Judy Hanwen, Picard, Rosalind
Open-domain dialog generation is a challenging problem; maximum likelihood training can lead to repetitive outputs, models have difficulty tracking long-term conversational goals, and training on standard movie or online datasets may lead to the generation of inappropriate, biased, or offensive text. Reinforcement Learning (RL) is a powerful framework that could potentially address these issues, for example by allowing a dialog model to optimize for reducing toxicity and repetitiveness. However, previous approaches which apply RL to open-domain dialog generation do so at the word level, making it difficult for the model to learn proper credit assignment for long-term conversational rewards. In this paper, we propose a novel approach to hierarchical reinforcement learning, VHRL, which uses policy gradients to tune the utterance-level embedding of a variational sequence model. This hierarchical approach provides greater flexibility for learning long-term, conversational rewards. We use self-play and RL to optimize for a set of human-centered conversation metrics, and show that our approach provides significant improvements -- in terms of both human evaluation and automatic metrics -- over state-of-the-art dialog models, including Transformers.
Detection of Real-world Driving-induced Affective State Using Physiological Signals and Multi-view Multi-task Machine Learning
Lopez-Martinez, Daniel, El-Haouij, Neska, Picard, Rosalind
Affective states have a critical role in driving performance and safety. They can degrade driver situation awareness and negatively impact cognitive processes, severely diminishing road safety. Therefore, detecting and assessing drivers' affective states is crucial in order to help improve the driving experience, and increase safety, comfort and well-being. Recent advances in affective computing have enabled the detection of such states. This may lead to empathic automotive user interfaces that account for the driver's emotional state and influence the driver in order to improve safety. In this work, we propose a multiview multi-task machine learning method for the detection of driver's affective states using physiological signals. The proposed approach is able to account for inter-drive variability in physiological responses while enabling interpretability of the learned models, a factor that is especially important in systems deployed in the real world. We evaluate the models on three different datasets containing real-world driving experiences. Our results indicate that accounting for drive-specific differences significantly improves model performance.
Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog
Jaques, Natasha, Ghandeharioun, Asma, Shen, Judy Hanwen, Ferguson, Craig, Lapedriza, Agata, Jones, Noah, Gu, Shixiang, Picard, Rosalind
Most deep reinforcement learning (RL) systems are not able to learn effectively from off-policy data, especially if they cannot explore online in the environment. These are critical shortcomings for applying RL to real-world problems where collecting data is expensive, and models must be tested offline before being deployed to interact with the environment -- e.g. systems that learn from human interaction. Thus, we develop a novel class of off-policy batch RL algorithms, which are able to effectively learn offline, without exploring, from a fixed batch of human interaction data. We leverage models pre-trained on data as a strong prior, and use KL-control to penalize divergence from this prior during RL training. We also use dropout-based uncertainty estimates to lower bound the target Q-values as a more efficient alternative to Double Q-Learning. The algorithms are tested on the problem of open-domain dialog generation -- a challenging reinforcement learning problem with a 20,000-dimensional action space. Using our Way Off-Policy algorithm, we can extract multiple different reward functions post-hoc from collected human interaction data, and learn effectively from all of these. We test the real-world generalization of these systems by deploying them live to converse with humans in an open-domain setting, and demonstrate that our algorithm achieves significant improvements over prior methods in off-policy batch RL.
Approximating Interactive Human Evaluation with Self-Play for Open-Domain Dialog Systems
Ghandeharioun, Asma, Shen, Judy Hanwen, Jaques, Natasha, Ferguson, Craig, Jones, Noah, Lapedriza, Agata, Picard, Rosalind
Building an open-domain conversational agent is a challenging problem. Current evaluation methods, mostly post-hoc judgments of single-turn evaluation, do not capture conversation quality in a realistic interactive context. In this paper, we investigate interactive human evaluation and provide evidence for its necessity; we then introduce a novel, model-agnostic, and dataset-agnostic method to approximate it. In particular, we propose a self-play scenario where the dialog system talks to itself and we calculate a combination of proxies such as sentiment and semantic coherence on the conversation trajectory. We show that this metric is capable of capturing the human-rated quality of a dialog model better than any automated metric known to-date, achieving a significant Pearson correlation (r>.7, p<.05). To investigate the strengths of this novel metric and interactive evaluation in comparison to state-of-the-art metrics and one-turn evaluation, we perform extended experiments with a set of models, including several that make novel improvements to recent hierarchical dialog generation architectures through sentiment and semantic knowledge distillation on the utterance level. Finally, we open-source the interactive evaluation platform we built and the dataset we collected to allow researchers to efficiently deploy and evaluate generative dialog models.
Deep Reinforcement Learning for Optimal Critical Care Pain Management with Morphine using Dueling Double-Deep Q Networks
Lopez-Martinez, Daniel, Eschenfeldt, Patrick, Ostvar, Sassan, Ingram, Myles, Hur, Chin, Picard, Rosalind
Opioids are the preferred medications for the treatment of pain in the intensive care unit. While undertreatment leads to unrelieved pain and poor clinical outcomes, excessive use of opioids puts patients at risk of experiencing multiple adverse effects. In this work, we present a sequential decision making framework for opioid dosing based on deep reinforcement learning. It provides real-time clinically interpretable dosing recommendations, personalized according to each patient's evolving pain and physiological condition. We focus on morphine, one of the most commonly prescribed opioids. To train and evaluate the model, we used retrospective data from the publicly available MIMIC-3 database. Our results demonstrate that reinforcement learning may be used to aid decision making in the intensive care setting by providing personalized pain management interventions.
Multi-task multiple kernel machines for personalized pain recognition from functional near-infrared spectroscopy brain signals
Lopez-Martinez, Daniel, Peng, Ke, Steele, Sarah C., Lee, Arielle J., Borsook, David, Picard, Rosalind
Currently there is no validated objective measure of pain. Recent neuroimaging studies have explored the feasibility of using functional near-infrared spectroscopy (fNIRS) to measure alterations in brain function in evoked and ongoing pain. In this study, we applied multi-task machine learning methods to derive a practical algorithm for pain detection derived from fNIRS signals in healthy volunteers exposed to a painful stimulus. Especially, we employed multi-task multiple kernel learning to account for the inter-subject variability in pain response. Our results support the use of fNIRS and machine learning techniques in developing objective pain detection, and also highlight the importance of adopting personalized analysis in the process.