Basu, Sumit
Teaching Classification Boundaries to Humans
Basu, Sumit (Microsoft Research) | Christensen, Janara (University of Washington)
Given a classification task, what is the best way to teach the resulting boundary to a human? While machine learning techniques can provide excellent methods for finding the boundary, including the selection of examples in an online setting, they tell us little about how we would teach a human the same task. We propose to investigate the problem of example selection and presentation in the context of teaching humans, and explore a variety of mechanisms in the interests of finding what may work best. In particular, we begin with the baseline of random presentation and then examine combinations of several mechanisms: the indication of an exampleโs relative difficulty, the use of the shaping heuristic from the cognitive science literature (moving from easier examples to harder ones), and a novel kernel-based โcoverage modelโ of the subjectโs mastery of the task. From our experiments on 54 human subjects learning and performing a pair of synthetic classification tasks via our teaching system, we found that we can achieve the greatest gains with a combination of shaping and the coverage model.
Learning from the Wisdom of Crowds by Minimax Entropy
Zhou, Dengyong, Basu, Sumit, Mao, Yi, Platt, John C.
An important way to make large training sets is to gather noisy labels from crowds of nonexperts. We propose a minimax entropy principle to improve the quality of these labels. Our method assumes that labels are generated by a probability distribution over workers, items, and labels. By maximizing the entropy of this distribution, the method naturally infers item confusability and worker expertise. We infer the ground truth by minimizing the entropy of this distribution, which we show minimizes the Kullback-Leibler (KL) divergence between the probability distribution and the unknown truth. We show that a simple coordinate descent scheme can optimize minimax entropy. Empirically, our results are substantially better than previously published methods for the same problem.
Invited Talks
Basu, Sumit (Double Fine Productions) | Jurney, Chris (US Army Simulation and Training Technology Center) | Sottilare, Bob (North Carolina State University) | Young, R. Michael
Chris Jurney (Lead Programmer, Double Fine Productions) Sumit Basu (Microsoft Research) Chris Jurney is a rock and roll experimental game For those who can play an instrument or have a respectable programmer at Double Fine Productions, with 11 singing voice, music can be a wonderful years experience in games and simulation. He has means of creative expression, social engagement, shipped 4 titles in the games industry: Company of and fun. For many others, though, it can be frustrating Heroes, Frontline: Fuel of War, Dawn of War 2, and and inaccessible: even if an inspired youth Brutal Legend. Jurney frequently speaks on the topic has great musical ideas, she may not have the of game AI, having presented at the Game Developers knowledge or ability to get her latest song out from Conference (GDC), GDC China, Columbia her head and into her MP3 player. In this talk, Basu will show three vignettes of how he and his colleagues University, the University of Pennsylvania, and the have used interactive machine learning to New Jersey and Philadelphia chapters of the International extend the creative reach of aspiring musicians: a Game Developers Association (IGDA).
User-Specific Learning for Recognizing a Singer's Intended Pitch
Guillory, Andrew (University of Washington) | Basu, Sumit (Microsoft Research) | Morris, Dan (Microsoft Research)
We consider the problem of automatic vocal melody transcription: translating an audio recording of a sung melody into a musical score. While previous work has focused on finding the closest notes to the singer's tracked pitch, we instead seek to recover the melody the singer intended to sing. Often, the melody a singer intended to sing differs from what they actually sang; our hypothesis is that this occurs in a singer-specific way. For example, a given singer may often be flat in certain parts of her range, or another may have difficulty with certain intervals. We thus pursue methods for singer-specific training which use learning to combine different methods for pitch prediction. In our experiments with human subjects, we show that via a short training procedure we can learn a singer-specific pitch predictor and significantly improve transcription of intended pitch over other methods. For an average user, our method gives a 20 to 30 percent reduction in pitch classification errors with respect to a baseline method which is comparable to commercial voice transcription tools. For some users, we achieve even more dramatic reductions. Our best results come from a combination of singer-specific-learning with non-singer-specific feature selection. We also discuss the implications of our work for training more general control signals. We make our experimental data available to allow others to replicate or extend our results.
Modeling Conversational Dynamics as a Mixed-Memory Markov Process
Choudhury, Tanzeem, Basu, Sumit
In this work, we quantitatively investigate the ways in which a given person influences the joint turn-taking behavior in a conversation. After collecting an auditory database of social interactions among a group of twenty-three people via wearable sensors (66 hours of data each over two weeks), we apply speech and conversation detection methods to the auditory streams. These methods automatically locate the conversations, determine their participants, and mark which participant was speaking when. We then model the joint turn-taking behavior as a Mixed-Memory Markov Model [1] that combines the statistics of the individual subjects' self-transitions and the partners' cross-transitions. The mixture parameters in this model describe how much each person's individual behavior contributes to the joint turn-taking behavior of the pair.
Modeling Conversational Dynamics as a Mixed-Memory Markov Process
Choudhury, Tanzeem, Basu, Sumit
There is a long history of work in the social sciences aimed at understanding the interactions between individuals and the influences they have on each others' behavior. However, existing studies of social network interactions have either been restricted to online communities, where unambiguous measurements about how people interact can be obtained, or have been forced to rely on questionnaires or diaries to get data on face-to-face interactions. Survey-based methods are error prone and impractical to scale up. Studies show that self-reports correspond poorly to communication behavior as recorded by independent observers [3]. In contrast, we have used wearable sensors and recent advances in speech processing techniques to automatically gather information about conversations: when they occurred, who was involved, and who was speaking when.