Dinakar, Karthik
Lensing Machines: Representing Perspective in Latent Variable Models
Dinakar, Karthik, Lieberman, Henry
Many datasets represent a combination of several viewpoints - different ways of looking at the same data that lead to different generalizations. For example, a corpus with examples generated by different people may be mixtures of many perspectives and can be viewed with different perspectives by others. It isn't always possible to represent the viewpoints by a clean separation, in advance, of examples representing each viewpoint and train a separate model for each viewpoint. We introduce lensing, a mixed-initiative technique to (1) extract'lenses' or mappings between machine-learned representations and perspectives of human experts, and to (2) generate'lensed' models that afford multiple perspectives of the same dataset. We apply lensing for two classes of latent variable models (a) a mixed-membership model and (b) a matrix factorization model in the context of two mental health applications, and we capture and imbue the perspectives of clinical psychologists into these models. Our work shows the benefits of the machine learning practitioner formally incorporating the perspective of a knowledgeable domain expert into their models rather than estimating unlensed models themselves in isolation.
Stacked Generalization Learning to Analyze Teenage Distress
Dinakar, Karthik (Massachusetts Institute of Technology) | Weinstein, Emily (Harvard University) | Lieberman, Henry (Massachusetts Institute of Technology) | Selman, Robert Louis (Harvard University)
The internet has become a resource for adolescents who are distressed by social and emotional problems. Social network analysis can provide new opportunities for helping people seeking support online, but only if we understand the salient issues that are highly relevant to participants personal circumstances. In this paper, we present a stacked generalization modeling approach to analyze an online community supporting adolescents under duress. While traditional predictive supervised methods rely on robust hand-crafted feature space engineering, mixed initiative semi-supervised topic models are often better at extracting high-level themes that go beyond such feature spaces. We present a strategy that combines the strengths of both these types of models inspired by Prevention Science approaches which deals with the identification and amelioration of risk factors that predict to psychological, psychosocial, and psychiatric disorders within and across populations (in our case teenagers) rather than treat them post-facto. In this study, prevention scientists used a social science thematic analytic approach to code stories according to a fine-grained analysis of salient social, developmental or psychological themes they deemed relevant, and these are then analyzed by a society of models. We show that a stacked generalization of such an ensemble fares better than individual binary predictive models.
You Too?! Mixed-Initiative LDA Story Matching to Help Teens in Distress
Dinakar, Karthik (Massachusetts Institute of Technology) | Jones, Birago (Massachusetts Institute of Technology) | Lieberman, Henry (Massachusetts Institute of Technology) | Picard, Rosalind (Massachusetts Institute of Technology) | Rose, Carolyn (Carnegie Mellon University) | Thoman, Matthew (Northeastern University) | Reichart, Roi (Massachusetts Institute of Technology)
Adolescent cyber-bullying on social networks is a phenomenon that has received widespread attention. Recent work by sociologists has examined this phenomenon under the larger context of teenage drama and it's manifestations on social networks. Tackling cyber-bullying involves two key components – automatic detection of possible cases, and interaction strategies that encourage reflection and emotional support. Key is showing distressed teenagers that they are not alone in their plight. Conventional topic spotting and document classification into labels like "dating" or "sports" are not enough to effectively match stories for this task. In this work, we examine a corpus of 5500 stories from distressed teenagers from a major youth social network. We combine Latent Dirichlet Allocation and human interpretation of its output using principles from sociolinguistics to extract high-level themes in the stories and use them to match new stories to similar ones. A user evaluation of the story matching shows that theme-based retrieval does a better job of finding relevant and effective stories for this application than conventional approaches.
Modeling the Detection of Textual Cyberbullying
Dinakar, Karthik (Massachusetts Institute of Technology) | Reichart, Roi (Hebrew University of Jerusalem) | Lieberman, Henry (Massachusetts Institute of Technology)
The scourge of cyberbullying has assumed alarming proportions with an ever-increasing number of adolescents admitting to having dealt with it either as a victim or as a bystander. Anonymity and the lack of meaningful supervision in the electronic medium are two factors that have exacerbated this social menace. Comments or posts involving sensitive topics that are personal to an individual are more likely to be internalized by a victim, often resulting in tragic outcomes. We decompose the overall detection problem into detection of sensitive topics, lending itself into text classification sub-problems. We experiment with a corpus of 4500 YouTube comments, applying a range of binary and multiclass classifiers. We find that binary classifiers for individual labels outperform multiclass classifiers. Our findings show that the detection of textual cyberbullying can be tackled by building individual topic-sensitive classifiers.