Not enough data to create a plot.
Try a different view from the menu above.
Mihalcea, Rada
Deep Learning for Text Attribute Transfer: A Survey
Jin, Di, Jin, Zhijing, Mihalcea, Rada
Driven by the increasingly larger deep learning models, neural language generation (NLG) has enjoyed unprecedentedly improvement and is now able to generate a diversity of human-like texts on demand, granting itself the capability of serving as a human writing assistant. Text attribute transfer is one of the most important NLG tasks, which aims to control certain attributes that people may expect the texts to possess, such as sentiment, tense, emotion, political position, etc. It has a long history in Natural Language Processing but recently gains much more attention thanks to the promising performance brought by deep learning models. In this article, we present a systematic survey on these works for neural text attribute transfer. We collect all related academic works since the first appearance in 2017. We then select, summarize, discuss, and analyze around 65 representative works in a comprehensive way. Overall, we have covered the task formulation, existing datasets and metrics for model development and evaluation, and all methods developed over the last several years. We reveal that existing methods are indeed based on a combination of several loss functions with each of which serving a certain goal. Such a unique perspective we provide could shed light on the design of new methods. We conclude our survey with a discussion on open issues that need to be resolved for better future development.
Compositional Demographic Word Embeddings
Welch, Charles, Kummerfeld, Jonathan K., Pรฉrez-Rosas, Verรณnica, Mihalcea, Rada
Word embeddings are usually derived from corpora containing text from many individuals, thus leading to general purpose representations rather than individually personalized representations. While personalized embeddings can be useful to improve language model performance and other language processing tasks, they can only be computed for people with a large amount of longitudinal data, which is not the case for new users. We propose a new form of personalized word embeddings that use demographic-specific word representations derived compositionally from full or partial demographic information for a user (i.e., gender, age, location, religion). We show that the resulting demographic-aware word representations outperform generic word representations on two tasks for English: language modeling and word associations. We further explore the trade-off between the number of available attributes and their relative effectiveness and discuss the ethical implications of using them.
Towards Automatic Detection of Misinformation in Online Medical Videos
Hou, Rui, Pรฉrez-Rosas, Verรณnica, Loeb, Stacy, Mihalcea, Rada
Recent years have witnessed a significant increase in the online sharing of medical information, with videos representing a large fraction of such online sources. Previous studies have however shown that more than half of the health-related videos on platforms such as YouTube contain misleading information and biases. Hence, it is crucial to build computational tools that can help evaluate the quality of these videos so that users can obtain accurate information to help inform their decisions. In this study, we focus on the automatic detection of misinformation in YouTube videos. We select prostate cancer videos as our entry point to tackle this problem. The contribution of this paper is twofold. First, we introduce a new dataset consisting of 250 videos related to prostate cancer manually annotated for misinformation. Second, we explore the use of linguistic, acoustic, and user engagement features for the development of classification models to identify misinformation. Using a series of ablation experiments, we show that we can build automatic models with accuracies of up to 74%, corresponding to a 76.5% precision and 73.2% recall for misinformative instances.
Variational Fusion for Multimodal Sentiment Analysis
Majumder, Navonil, Poria, Soujanya, Krishnamurthy, Gangeshwar, Chhaya, Niyati, Mihalcea, Rada, Gelbukh, Alexander
This is important, as more and more enterprises tend to make business decisions based on the user sentiment behind their products as expressed through these videos. Multimodal fusion is considered a key step in multimodal sentiment analysis. Most recent work on multimodal fusion (Poria et al., 2017; Zadeh et al., 2018c) has focused on the strategy of obtaining a multimodal representation from the independent unimodal representations. Our approach takes this strategy one step further, by also requiring that the original unimodal representations be reconstructed from the unified multimodal representation. The motivation behind this is the intuition that different modalities are an expression of the state of the mind. Hence, if we assume that the fused representation is the mind-state/sentiment/emotion, then in our approach we are ensuring that the fused representation can be mapped back to the unimodal representations, which should improve the quality of the multi-modal representation. In this paper, we empirically argue that this is the case by showing that this approach outperforms the state-of-the-art in mul-timodal fusion. We employ a variational autoencoder (V AE) (Kingma and Welling, 2014), where the encoder network generates a latent representation from the unimodal representations.
Emotion Recognition in Conversation: Research Challenges, Datasets, and Recent Advances
Poria, Soujanya, Majumder, Navonil, Mihalcea, Rada, Hovy, Eduard
Emotion is intrinsic to humans and consequently emotion understanding is a key part of human-like artificial intelligence (AI). Emotion recognition in conversation (ERC) is becoming increasingly popular as a new research frontier in natural language processing (NLP) due to its ability to mine opinions from the plethora of publicly available conversational data in platforms such as Facebook, Youtube, Reddit, Twitter, and others. Moreover, it has potential applications in health-care systems (as a tool for psychological analysis), education (understanding student frustration) and more. Additionally, ERC is also extremely important for generating emotion-aware dialogues that require an understanding of the user's emotions. Catering to these needs calls for effective and scalable conversational emotion-recognition algorithms. However, it is a strenuous problem to solve because of several research challenges. In this paper, we discuss these challenges and shed light on the recent research in this field. We also describe the drawbacks of these approaches and discuss the reasons why they fail to successfully overcome the research challenges in ERC.
Look Who's Talking: Inferring Speaker Attributes from Personal Longitudinal Dialog
Welch, Charles, Pรฉrez-Rosas, Verรณnica, Kummerfeld, Jonathan K., Mihalcea, Rada
We examine a large dialog corpus obtained from the conversation history of a single individual with 104 conversation partners. The corpus consists of half a million instant messages, across several messaging platforms. We focus our analyses on seven speaker attributes, each of which partitions the set of speakers, namely: gender; relative age; family member; romantic partner; classmate; co-worker; and native to the same country. In addition to the content of the messages, we examine conversational aspects such as the time messages are sent, messaging frequency, psycholinguistic word categories, linguistic mirroring, and graph-based features reflecting how people in the corpus mention each other. We present two sets of experiments predicting each attribute using (1) short context windows; and (2) a larger set of messages. We find that using all features leads to gains of 9-14% over using message text only.
Whatโs Hot in Human Language Technology: Highlights from NAACL HLT 2015
Chai, Joyce Y. (Michigan State University) | Sarkar, Anoop (Simon Fraser University) | Mihalcea, Rada (University of Michigan)
Several discriminative models with latent variables were also explored to learn better alignment models in a wetlab The Conference of the North American Chapter of the Association experiment domain (Naim et al. 2015). As alignment is for Computational Linguistics: Human Language often the first step in many problems involving language and Technology (NAACL HLT) is a premier conference reporting vision, these approaches and empirical results provide important outstanding research on human language technology.
Left-Handed or Right-Handed? A Data-Driven Approach to Analysing Characteristics of Handedness Based on Language Use
Choe, Ho-gene (University of Michigan) | Mihalcea, Rada (University of Michigan)
Numerous studies have identified differences between left-handed and right-handed people, especially in the fields of psychology and neuroscience. Using a social media setting, this paper presents a data-driven approach to explore whether a person's handedness can be identified given his or her writing, and shows handedness characteristics that can be inferred from language.
Cultural Influences on the Measurement of Personal Values through Words
Wilson, Steven R. (University of Michigan) | Mihalcea, Rada (University of Michigan ) | Boyd, Ryan L. (University of Texas, Austin ) | Pennebaker, James W. (University of Texas, Austin)
Texts posted on the web by users from diverse cultures provide a nearly endless source of data that researchers can use to study human thoughts and language patterns. However, unless care is taken to avoid it, models may be developed in one cultural setting and deployed in another, leading to unforeseen consequences. We explore the effects of using models built from a corpus of texts from multiple cultures in order to learn about each represented people group separately. To do this, we employ a topic modeling approach to quantify open-ended writing responses describing personal values and everyday behaviors in two distinct cultures. We show that some topics are more prominent in one culture compared to the other, while other topics are mentioned to similar degrees. Furthermore, our results indicate that culture influences how value-behavior relationships are exhibited. While some relationships exist in both cultural groups, in most cases we see that the observed relations are dependent on the cultural background of the data set under examination.
Semantic Relatedness Using Salient Semantic Analysis
Hassan, Samer Hassan (University of North Texas) | Mihalcea, Rada (University of North Texas)
Semantic relatedness is the task of finding and quantifying Knowledge-based measures such as L&C (Leacock the strength of the semantic connections that exist between and Chodorow 1998), Lesk (Lesk 1986), Wu&Palmer (Wu textual units, be they word pairs, sentence pairs, or document and Palmer 1994), Resnik (Resnik 1995), J&C (Jiang and pairs. For instance, one may want to determine how Conrath 1997), H&S (Hirst and St Onge 1998), and many semantically related are car and automobile, ornoon and others, employ information extracted from manually constructed string. To make such a judgment, we rely on our accumulated lexical taxonomies like Wordnet (Fellbaum 1998), knowledge and experiences, and utilize our ability Roget (Jarmasz 2003), and Wiktionary (Zesch, Muller, and of conceptual thinking, abstraction, and generalization.