Harvard University
Extending Workers' Attention Span Through Dummy Events
Elmalech, Avshalom (Harvard University) | Sarne, David (Bar Ilan University) | David, Esther (Ashkelon Academic College) | Hajaj, Chen (Vanderbilt University)
This paper studies a new paradigm for improving the attention span of workers in tasks that heavily rely on user's attention to the occurrence of rare events. Such tasks are highly common, ranging from crime monitoring to controlling autonomous complex machines, and many of them are ideal for crowdsourcing.ย The underlying idea in our approach is to dynamically augment the task with some dummy (artificial) events at different times throughout the task, rewarding the worker upon identifying and reporting them.ย This, as an alternative to the traditional approach of exclusively relying on rewarding the worker for successfully identifying the event of interest itself.ย We propose three methods for timing the dummy events throughout the task. Two of these methods are static and determine the timing of the dummy events at random or uniformly throughout the task. The third method is dynamic and uses the identification (or misidentification) of dummy events as a signal for the worker's attention to the task, adjusting the rate of dummy events generation accordingly. We use extensive experimentation to compare the methods with the traditional approach of inducing attention through rewarding the identification of the event of interest and within the three. The analysis of the results indicates that with the use of dummy events a substantially more favorable tradeoff between the detection (of the event of interest) probability and the expected expense can be achieved, and that among the three proposed method the one that decides on dummy events on the fly is (by far) the best.
Predicting Crowd Work Quality under Monetary Interventions
Yin, Ming (Harvard University) | Chen, Yiling (Harvard University)
Work quality in crowdsourcing task sessions can change over time due to both internal factors, such as learning and boredom, and external factors like the provision of monetary interventions. Prior studies on crowd work quality have focused on characterizing the temporal behavior pattern as a result of the internal factors. In this paper, we propose to explicitly take the impact of external factors into consideration for modeling crowd work quality. We present a series of seven models from three categories (supervised learning models, autoregressive models and Markov models) and conduct an empirical comparison on how well these models can predict crowd work quality under monetary interventions on three datasets that are collected from Amazon Mechanical Turk. Our results show that all these models outperform the baseline models that donโt consider the impact of monetary interventions. Our empirical comparison further identifies the random forests model as an excellent model to use in practice as it consistently provides accurate predictions with high confidence across different datasets, and it also demonstrates robustness against limited training data and limited access to the ground truth.
Character-Aware Neural Language Models
Kim, Yoon (Harvard University) | Jernite, Yacine (New York University) | Sontag, David (New York University) | Rush, Alexander M. (Harvard University)
We describe a simple neural language model that relies only on character-level inputs. Predictions are still made at the word-level. Our model employs a convolutional neural network (CNN) and a highway net work over characters, whose output is given to a long short-term memory (LSTM) recurrent neural network language model (RNN-LM). On the English Penn Treebank the model is on par with the existing state-of-the-art despite having 60% fewer parameters. On languages with rich morphology (Arabic, Czech, French, German, Spanish, Russian), the model outperforms word-level/morpheme-level LSTM baselines, again with fewer parameters. The results suggest that on many languages, character inputs are sufficient for language modeling. Analysis of word representations obtained from the character composition part of the model reveals that the model is able to encode, from characters only, both semantic and orthographic information.
Exploiting an Oracle That Reports AUC Scores in Machine Learning Contests
Whitehill, Jacob (Harvard University)
In machine learning contests such as the ImageNet Large Scale Visual Recognition Challenge and the KDD Cup, contestants can submit candidate solutions and receive from an oracle (typically the organizers of the competition) the accuracy of their guesses compared to the ground-truth labels. One of the most commonly used accuracy metrics for binary classification tasks is the Area Under the Receiver Operating Characteristics Curve (AUC). In this paper we provide proofs-of-concept of how knowledge of the AUC of a set of guesses can be used, in two different kinds of attacks, to improve the accuracy of those guesses. On the other hand, we also demonstrate the intractability of one kind of AUC exploit by proving that the number of possible binary labelings of n examples for which a candidate solution obtains a AUC score of c grows exponentially in n, for every c in (0,1).
Embedding Ethical Principles in Collective Decision Support Systems
Greene, Joshua (Harvard University) | Rossi, Francesca (University of Padova and IBM T. J. Watson) | Tasioulas, John (King's College London) | Venable, Kristen Brent (Tulane University and IHMC) | Williams, Brian (Massachusetts Institute of Technology)
The future will see autonomous machines acting in the same environment as humans, in areas as diverse as driving, assistive technology, and health care. Think of self-driving cars, companion robots, and medical diagnosis support systems. We also believe that humans and machines will often need to work together and agree on common decisions. Thus hybrid collective decision making systems will be in great need. In this scenario, both machines and collective decision making systems should follow some form of moral values and ethical principles (appropriate to where they will act but always aligned to humans'), as well as safety constraints. In fact, humans would accept and trust more machines that behave as ethically as other humans in the same environment. Also, these principles would make it easier for machines to determine their actions and explain their behavior in terms understandable by humans. Moreover, often machines and humans will need to make decisions together, either through consensus or by reaching a compromise. This would be facilitated by shared moral values and ethical principles.
Large-Scale Collaborative Innovation: Challenges, Visions and Approaches
Siangliulue, Pao (Harvard University) | Chan, Joel (Carnegie Mellon University) | Arnold, Kenneth C. (Harvard University) | Huber, Bernd (Harvard University) | Dow, Steven P. (Carnegie Mellon University) | Gajos, Krzysztof Z. (Harvard University)
Emerging online innovation platforms have enabled large groups of people to collaborate and generate ideas together in ways that were not possible before. However, these platforms also introduce new challenges in helping their members to generate diverse and high quality ideas. In this paper, we enumerate collaboration challenges in crowd innovation: finding inspiration for contributors from a large number of ideas, motivating crowd to contribute to improve group understanding of the problem and solution space, and coordinating collective effort to reduce redundancy and increase quality and breadth of generated ideas. We discuss possible solutions to this problem and present our recent work that addresses some of these challenges using techniques from human computation and machine learning.
Implementing the Wisdom of Waze
Vasserman, Shoshana (Harvard University) | Feldman, Michal (Tel-Aviv University) | Hassidim, Avinatan (Bar Ilan University, Google)
We study a setting of non-atomic routing in a network of m parallel links with asymmetry of information. While a central entity (such as a GPS navigation system) โ a mediator hereafter โ knows the cost functions associated with the links, they are unknown to the individual agents controlling the flow. The mediator gives incentive compatible recommendations to agents, trying to minimize the total travel time. Can the mediator do better than when agents minimize their travel time selfishly without coercing agents to follow his recommendations? We study the mediation ratio: the ratio between the mediated equilibrium obtained from an incentive compatible mediation protocol and the social optimum. We find that mediation protocols can reduce the efficiency loss compared to the full revelation alternative, and compared to the non mediated Nash equilibrium. In particular, in the case of two links with affine cost functions, the mediation ratio is at most 8/7, and remains strictly smaller than the price of anarchy of 4/3 for any fixed m. Yet, it approaches the price of anarchy as m grows. For general (monotone) cost functions, the mediation ratio is at most m, a significant improvement over the unbounded price of anarchy
Bonus or Not? Learn to Reward in Crowdsourcing
Yin, Ming (Harvard University) | Chen, Yiling (Harvard University)
Recent work has shown that the quality of work produced in a crowdsourcing working session can be influenced by the presence of performance-contingent financial incentives, such as bonuses for exceptional performance, in the session. We take an algorithmic approach to decide when to offer bonuses in a working session to improve the overall utility that a requester derives from the session. Specifically, we propose and train an input-output hidden Markov model to learn the impact of bonuses on work quality and then use this model to dynamically decide whether to offer a bonus on each task in a working session to maximize a requesterโs utility. Experiments on Amazon Mechanical Turk show that our approach leads to higher utility for the requester than fixed and random bonus schemes do. Simulations on synthesized data sets further demonstrate the robustness of our approach against different worker population and worker behavior in improving requester utility.
Twitter Geolocation and Regional Classification via Sparse Coding
Cha, Miriam (Harvard University) | Gwon, Youngjune (Harvard University) | Kung, H. T. (Harvard University)
We present a data-driven approach for Twitter geolocation and regional classification. Our method is based on sparse coding and dictionary learning, an unsupervised method popular in computer vision and pattern recognition. Through a series of optimization steps that integrate information from both feature and raw spaces, and enhancements such as PCA whitening, feature augmentation, and voting-based grid selection, we lower geolocation errors and improve classification accuracy from previously known results on the GEOTEXT dataset.
Graph-Sparse LDA: A Topic Model with Structured Sparsity
Doshi-Velez, Finale (Harvard University) | Wallace, Byron C. (University of Texas at Austin) | Adams, Ryan (Harvard University)
Topic modeling is a powerful tool for uncovering latent structure in many domains, including medicine, finance, and vision. The goals for the model vary depending on the application: sometimes the discovered topics are used for prediction or another downstream task. In other cases, the content of the topic may be of intrinsic scientific interest. Unfortunately, even when one uses modern sparse techniques, discovered topics are often difficult to interpret due to the high dimensionality of the underlying space. To improve topic interpretability, we introduce Graph-Sparse LDA, a hierarchical topic model that uses knowledge of relationships between words (e.g., as encoded by an ontology). In our model, topics are summarized by a few latent concept-words from the underlying graph that explain the observed words. Graph-Sparse LDA recovers sparse, interpretable summaries on two real-world biomedical datasets while matching state-of-the-art prediction performance.