Goto

Collaborating Authors

 Country


Poetry of the Crowd: A Human Computation Algorithm to Convert Prose into Rhyming Verse

AAAI Conferences

Poetry composition is a very complex task that requires a poet to satisfy multiple constraints concurrently. We believe that the task can be augmented by combining the creative abilities of humans with computational algorithms that efficiently constrain and permute available choices. We present a hybrid method for generating poetry from prose that combines crowdsourcing with natural language processing (NLP) machinery. We test the ability of crowd workers to accomplish the technically challenging and creative task of composing poems.


Preface

AAAI Conferences

Welcome to the Second AAAI Conference on Human Computation and Crowdsourcing (HCOMP 2014) held November 2-4, 2014, in Pittsburgh, Pennsylvania. This conference is an opportunity to build on the success of the First AAAI Human Computation and Crowdsourcing conference, and to promote the best scholarship in this vibrant and fast emerging, multidisciplinary area. The conference also comes on the heels of four HCOMP workshops, including two workshops hosted at the annual AAAI conference. The HCOMP conference is designed to be a venue for exchanging ideas and developments on principles, experiments, and implementations of systems that rely on programmatic access to human intellect to perform some aspect of computation, or where human perception, knowledge, reasoning, or coordinated activity contributes to the operation of larger systems and applications. Topics relevant to the discipline of human computation and crowdsourcing include human-computer interaction (HCI), computer-supported collaborative work (CSCW), cognitive psychology, organizational behavior, economics, information retrieval, databases, computer systems and programming languages, and optimization.


Predicting Own Action: Self-Fulfilling Prophecy Induced by Proper Scoring Rules

AAAI Conferences

This paper studies a mechanism to incentivize agents who predict their own future actions and truthfully declare their predictions. In a crowdsouring setting (e.g., participatory sensing), obtaining an accurate prediction of the actions of workers/agents is valuable for a requester who is collecting real-world information from the crowd. If an agent predicts an external event that she cannot control herself (e.g., tomorrow's weather), any proper scoring rule can give an accurate incentive. In our problem setting, an agent needs to predict her own action (e.g., what time tomorrow she will take a photo of a specific place) that she can control to maximize her utility. Also, her (gross) utility can vary based on an eternal event. We first prove that a mechanism can satisfy our goal if and only if it utilizes a strictly proper scoring rule, assuming that an agent can find an optimal declaration that maximizes her expected utility. This declaration is self-fulfilling; if she acts to maximize her utility, the probabilistic distribution of her action matches her declaration, assuming her prediction about the external event is correct. Furthermore, we develop a heuristic algorithm that efficiently finds a semi-optimal declaration, and show that this declaration is still self-fulfilling. We also examine our heuristic algorithm's performance and describe how an agent acts when she faces an unexpected scenario.


Crowdsourced Data Analytics: A Case Study of a Predictive Modeling Competition

AAAI Conferences

Predictive modeling competitions provide a new data mining approach that leverages crowds of data scientists to examine a wide variety of predictive models and build the best performance model. In this paper, we report the results of a study conducted on CrowdSolving, a platform for predictive modeling competitions in Japan. We hosted a competition on a link prediction task and observed that (i) the prediction performance of the winner significantly outperformed that of a state-of-the-art method, (ii) the aggregated model constructed from all submitted models further improved the final performance, and (iii) the performance of the aggregated model built only from early submissions nevertheless overtook the final performance of the winner.


Keynote Speakers

AAAI Conferences

Kristen Grauman is an associate professor in the Department of Computer Science at the University of Texas at Austin. Her research in computer vision and machine learning focuses on visual search and object recognition. Before joining the University of Texas at Austin in 2007, she received her Ph.D. in the Electrical Engineering and Computer Science department at the Massachusetts Institute of Technology, in the Computer Science and Artificial Intelligence Laboratory. She is an Alfred P. Sloan Research Fellow and Microsoft Research New Faculty Fellow, a recipient of NSF CAREER and ONR Young Investigator awards, the Regents' Outstanding Teaching Award from the University of Texas System in 2012, the PAMI Young Researcher Award in 2013, the 2013 Computers and Thought Award from the International Joint Conference on Artificial Intelligence, and a Presidential Early Career Award for Scientists and Engineers (PECASE) in 2013. She and her collaborators were recognized with the CVPR Best Student Paper Award in 2008 for their work on hashing algorithms for large-scale image retrieval, and the Marr Best Paper Prize at ICCV in 2011 for their work on modeling relative visual attributes.


Optimal Worker Quality and Answer Estimates in Crowd-Powered Filtering and Rating

AAAI Conferences

We consider the problem of optimally filtering (or rating) a set of items based on predicates (or scoring) requiring human evaluation. Filtering and rating are ubiquitous problems across crowdsourcing applications. We consider the setting where we are given a set of items and a set of worker responses for each item: yes/no in the case of filtering and an integer value in the case of rating. We assume that items have a true inherent value that is unknown, and workers draw their responses from a common, but hidden, error distribution. Our goal is to simultaneously assign a ground truth to the item-set and estimate the worker error distribution. Previous work in this area has focused on heuristics such as Expectation Maximization (EM), providing only a local optima guarantee, while we have developed a general framework that finds a maximum likelihood solution. Our approach extends to a number of variations on the filtering and rating problems.


Adapting Collaborative Filtering to Personalized Audio Production

AAAI Conferences

Recommending media objects to users typically requires users to rate existing media objects so as to understand their preferences. The number of ratings required to produce good suggestions can be reduced through collaborative filtering. Collaborative filtering is more difficult when prior users have not rated the same set of media objects as the current user or each other. In this work, we describe an approach to applying prior user data in a way that does not require users to rate the same media objects and that does not require imputation (estimation) of prior user ratings of objects they have not rated. This approach is applied to the problem of finding good equalizer settings for music audio and is shown to greatly reduce the number of ratings the current user must make to find a good equalization setting.


An ensemble-based system for automatic screening of diabetic retinopathy

arXiv.org Machine Learning

In this paper, an ensemble-based method for the screening of diabetic retinopathy (DR) is proposed. This approach is based on features extracted from the output of several retinal image processing algorithms, such as image-level (quality assessment, pre-screening, AM/FM), lesion-specific (microaneurysms, exudates) and anatomical (macula, optic disc) components. The actual decision about the presence of the disease is then made by an ensemble of machine learning classifiers. We have tested our approach on the publicly available Messidor database, where 90% sensitivity, 91% specificity and 90% accuracy and 0.989 AUC are achieved in a disease/no-disease setting. These results are highly competitive in this field and suggest that retinal image processing is a valid approach for automatic DR screening.


A random forest system combination approach for error detection in digital dictionaries

arXiv.org Machine Learning

When digitizing a print bilingual dictionary, whether via optical character recognition or manual entry, it is inevitable that errors are introduced into the electronic version that is created. We investigate automating the process of detecting errors in an XML representation of a digitized print dictionary using a hybrid approach that combines rule-based, feature-based, and language model-based methods. We investigate combining methods and show that using random forests is a promising approach. We find that in isolation, unsupervised methods rival the performance of supervised methods. Random forests typically require training data so we investigate how we can apply random forests to combine individual base methods that are themselves unsupervised without requiring large amounts of training data. Experiments reveal empirically that a relatively small amount of data is sufficient and can potentially be further reduced through specific selection criteria.


Robust sketching for multiple square-root LASSO problems

arXiv.org Machine Learning

Many learning tasks, such as cross-validation, parameter search, or leave-one-out analysis, involve multiple instances of similar problems, each instance sharing a large part of learning data with the others. We introduce a robust framework for solving multiple square-root LASSO problems, based on a sketch of the learning data that uses low-rank approximations. Our approach allows a dramatic reduction in computational effort, in effect reducing the number of observations from $m$ (the number of observations to start with) to $k$ (the number of singular values retained in the low-rank model), while not sacrificing---sometimes even improving---the statistical performance. Theoretical analysis, as well as numerical experiments on both synthetic and real data, illustrate the efficiency of the method in large scale applications.