Teevan, Jaime
Interpretable User Satisfaction Estimation for Conversational Systems with Large Language Models
Lin, Ying-Chun, Neville, Jennifer, Stokes, Jack W., Yang, Longqi, Safavi, Tara, Wan, Mengting, Counts, Scott, Suri, Siddharth, Andersen, Reid, Xu, Xiaofeng, Gupta, Deepak, Jauhar, Sujay Kumar, Song, Xia, Buscher, Georg, Tiwary, Saurabh, Hecht, Brent, Teevan, Jaime
Accurate and interpretable user satisfaction estimation (USE) is critical for understanding, evaluating, and continuously improving conversational systems. Users express their satisfaction or dissatisfaction with diverse conversational patterns in both general-purpose (ChatGPT and Bing Copilot) and task-oriented (customer service chatbot) conversational systems. Existing approaches based on featurized ML models or text embeddings fall short in extracting generalizable patterns and are hard to interpret. In this work, we show that LLMs can extract interpretable signals of user satisfaction from their natural language utterances more effectively than embedding-based approaches. Moreover, an LLM can be tailored for USE via an iterative prompting framework using supervision from labeled examples. The resulting method, Supervised Prompting for User satisfaction Rubrics (SPUR), not only has higher accuracy but is more interpretable as it scores user satisfaction via learned rubrics with a detailed breakdown.
Learning Causal Effects on Hypergraphs
Ma, Jing, Wan, Mengting, Yang, Longqi, Li, Jundong, Hecht, Brent, Teevan, Jaime
Hypergraphs provide an effective abstraction for modeling multi-way group interactions among nodes, where each hyperedge can connect any number of nodes. Different from most existing studies which leverage statistical dependencies, we study hypergraphs from the perspective of causality. Specifically, in this paper, we focus on the problem of individual treatment effect (ITE) estimation on hypergraphs, aiming to estimate how much an intervention (e.g., wearing face covering) would causally affect an outcome (e.g., COVID-19 infection) of each individual node. Existing works on ITE estimation either assume that the outcome on one individual should not be influenced by the treatment assignments on other individuals (i.e., no interference), or assume the interference only exists between pairs of connected individuals in an ordinary graph. We argue that these assumptions can be unrealistic on real-world hypergraphs, where higher-order interference can affect the ultimate ITE estimations due to the presence of group interactions. In this work, we investigate high-order interference modeling, and propose a new causality learning framework powered by hypergraph neural networks. Extensive experiments on real-world hypergraphs verify the superiority of our framework over existing baselines.
A Crowd of Your Own: Crowdsourcing for On-Demand Personalization
Organisciak, Peter (University of Illinois at Urbana-Champaign) | Teevan, Jaime (Microsoft Research) | Dumais, Susan (Microsoft Research) | Miller, Robert C. (MIT CSAIL) | Kalai, Adam Tauman (Microsoft Research)
Personalization is a way for computers to support people’s diverse interests and needs by providing content tailored to the individual. While strides have been made in algorithmic approaches to personalization, most require access to a significant amount of data. However, even when data is limited online crowds can be used to infer an individual’s personal preferences. Aided by the diversity of tastes among online crowds and their ability to understand others, we show that crowdsourcing is an effective on-demand tool for personalization. Unlike typical crowdsourcing approaches that seek a ground truth, we present and evaluate two crowdsourcing approaches designed to capture personal preferences. The first, taste-matching , identifies workers with similar taste to the requester and uses their taste to infer the requester’s taste. The second, taste-grokking , asks workers to explicitly predict the requester’s taste based on training examples. These techniques are evaluated on two subjective tasks, personalized image recommendation and tailored textual summaries. Taste-matching and taste-grokking both show improvement over the use of generic workers, and have different benefits and drawbacks depending on the complexity of the task and the variability of the taste space.
Personalized Human Computation
Organisciak, Peter (University of Illinois at Urbana-Champaign) | Teevan, Jaime (Microsoft Research) | Dumais, Susan (Microsoft Research) | Miller, Robert C. (MIT CSAIL) | Kalai, Adam Tauman (Microsoft Research)
Significant effort in machine learning and information retrieval has been devoted to identifying personalized content such as recommendations and search results. Personalized human computation has the potential to go beyond existing techniques like collaborative filtering to provide personalized results on demand, over personal data, and for complex tasks. This work-in-progress compares two approaches to personalized human computation. In both, users annotate a small set of training examples which are then used by the crowd to annotate unseen items. In the first approach, which we call taste-matching, crowd members are asked to annotate the same set of training examples, and the ratings of similar users on other items are then used to infer personalized ratings. In the second approach, taste-grokking, the crowd is presented with the training examples and asked to use them predict the ratings of the target user on other items.
SearchBuddies: Bringing Search Engines into the Conversation
Hecht, Brent (Northwestern University) | Teevan, Jaime (Microsoft Research) | Morris, Meredith Ringel (Microsoft Research) | Liebling, Dan (Microsoft Research)
Although people receive trusted, personalized recommendations and auxiliary social benefits when they ask questions of their friends, using a search engine is often a more effective way to find an answer. Attempts to integrate social and algorithmic search have thus far focused on bringing social content into algorithmic search results. However, more of the benefits of social search can be preserved by reversing this approach and bringing algorithmic content into natural question-based conversations. To do this successfully, it is necessary to adapt search engine interaction to a social context. In this paper, we present SearchBuddies, a system that responds to Facebook status message questions with algorithmic search results. Via a three-month deployment of the system to 122 social network users, we explore how people responded to search content in a highly social environment. Our experience deploying SearchBuddies shows that a socially embedded search engine can successfully provide users with unique and highly relevant information in a social context and can be integrated into conversations around an information need. The deployment also illuminates specific challenges of embedding a search engine in a social environment and provides guidance toward solutions.
Culture Matters: A Survey Study of Social Q&A Behavior
Yang, Jiang (University of Michigan) | Morris, Meredith Ringel (Microsoft Research) | Teevan, Jaime (Microsoft Research) | Adamic, Lada A. (University of Michigan) | Ackerman, Mark S. (University of Michigan)
Online social networking tools are used around the world by people to ask questions of their friends, because friends provide direct, reliable, contextualized, and interactive responses. However, although the tools used in different cultures for question asking are often very similar, the way they are used can be very different, reflecting unique inherent cultural characteristics. We present the results of a survey designed to elicit cultural differences in people’s social question asking behaviors across the United States, the United Kingdom, China, and India. The survey received responses from 933 people distributed across the four countries who held similar job roles and were employed by a single organization. Responses included information about the questions they ask via social networking tools, and their motivations for asking and answering questions online. The results reveal culture as a consistently significant factor in predicting people’s social question and answer behavior. The prominent cultural differences we observe might be traced to people’s inherent cultural characteristics (e.g., their cognitive patterns and social orientation), and should be comprehensively considered in designing social search systems.
A Comparison of Information Seeking Using Search Engines and Social Networks
Morris, Meredith Ringel (Microsoft Research) | Teevan, Jaime (Microsoft Research) | Panovich, Katrina (Massachusetts Institute of Technology)
The Web has become an important information repository; often it is the first source a person turns to with an informa-tion need. One common way to search the Web is with a search engine. However, it is not always easy for people to find what they are looking for with keyword search, and at times the desired information may not be readily available online. An alternative, facilitated by the rise of social media, is to pose a question to one‟s online social network. In this paper, we explore the pros and cons of using a social net-working tool to fill an information need, as compared with a search engine. We describe a study in which 12 participants searched the Web while simultaneously posing a question on the same topic to their social network, and we compare the results they found by each method.