Instructional Material
STEP: A Scalable Testing and Evaluation Platform
Christoforaki, Maria (New York University) | Ipeirotis, Panagiotis (New York University)
The emergence of online crowdsourcing sites, online work platforms, and evenMassive Open Online Courses (MOOCs), has created an increasing need for reliably evaluating the skills of the participating users in a scalable way.Many platforms already allow users to take online tests and verify their skills, but the existing approaches face many problems. First of all, cheating is very common in online testing without supervision, as the test questions often "leak" and become easily available online together with the answers.Second, technical skills, such as programming, require the tests to be frequently updated in order to reflect the current state-of-the-art. Third,there is very limited evaluation of the tests themselves, and how effectively they measure the skill that the users are tested for. In this paper, we present a Scalable Testing and Evaluation Platform (STEP),that allows continuous generation and evaluation of test questions. STEP leverages already available content, on Question Answering sites such as StackOverflow and re-purposes these questions to generate tests. The system utilizes a crowdsourcing component for the editing of the questions, while it uses automated techniques for identifying promising QA threads that can be successfully re-purposed for testing. This continuous question generation decreases the impact of cheating and also creates questions that are closer to the real problems that the skill holder is expected to solve in real life.STEP also leverages the use of Item Response Theory to evaluate the quality of the questions. We also use external signals about the quality of the workers.These identify the questions that have the strongest predictive ability in distinguishing workers that have the potential to succeed in the online job marketplaces. Existing approaches contrast in using only internal consistency metrics to evaluate the questions. Finally, our system employs an automatic "leakage detector" that queries the Internet to identify leaked versions of our questions. We then mark these questions as "practice only," effectively removing them from the pool of questions used for evaluation. Our experimental evaluation shows that our system generates questions of comparable or higher quality compared to existing tests, with a cost of approximately 3-5 dollars per question, which is lower than the cost of licensing questions from existing test banks.
Preface
Bigham, Jeffrey P. (Carnegie Mellon University) | Parkes, David C. (Harvard University)
Welcome to the Second AAAI Conference on Human Computation and Crowdsourcing (HCOMP 2014) held November 2-4, 2014, in Pittsburgh, Pennsylvania. This conference is an opportunity to build on the success of the First AAAI Human Computation and Crowdsourcing conference, and to promote the best scholarship in this vibrant and fast emerging, multidisciplinary area. The conference also comes on the heels of four HCOMP workshops, including two workshops hosted at the annual AAAI conference. The HCOMP conference is designed to be a venue for exchanging ideas and developments on principles, experiments, and implementations of systems that rely on programmatic access to human intellect to perform some aspect of computation, or where human perception, knowledge, reasoning, or coordinated activity contributes to the operation of larger systems and applications. Topics relevant to the discipline of human computation and crowdsourcing include human-computer interaction (HCI), computer-supported collaborative work (CSCW), cognitive psychology, organizational behavior, economics, information retrieval, databases, computer systems and programming languages, and optimization.
A Comparison of learning algorithms on the Arcade Learning Environment
Defazio, Aaron, Graepel, Thore
Reinforcement learning agents have traditionally been evaluated on small toy problems. With advances in computing power and the advent of the Arcade Learning Environment, it is now possible to evaluate algorithms on diverse and difficult problems within a consistent framework. We discuss some challenges posed by the arcade learning environment which do not manifest in simpler environments. We then provide a comparison of model-free, linear learning algorithms on this challenging problem set.
Learning-Assisted Automated Reasoning with Flyspeck
Kaliszyk, Cezary, Urban, Josef
The considerable mathematical knowledge encoded by the Flyspeck project is combined with external automated theorem provers (ATPs) and machine-learning premise selection methods trained on the proofs, producing an AI system capable of answering a wide range of mathematical queries automatically. The performance of this architecture is evaluated in a bootstrapping scenario emulating the development of Flyspeck from axioms to the last theorem, each time using only the previous theorems and proofs. It is shown that 39% of the 14185 theorems could be proved in a push-button mode (without any high-level advice and user interaction) in 30 seconds of real time on a fourteen-CPU workstation. The necessary work involves: (i) an implementation of sound translations of the HOL Light logic to ATP formalisms: untyped first-order, polymorphic typed first-order, and typed higher-order, (ii) export of the dependency information from HOL Light and ATP proofs for the machine learners, and (iii) choice of suitable representations and methods for learning from previous proofs, and their integration as advisors with HOL Light. This work is described and discussed here, and an initial analysis of the body of proofs that were found fully automatically is provided.
AAAI News
Participants Intelligence (AAAI-15) and the Twenty-Seventh Conference in the AAAI-15 Robotics Exhibition and the on Innovative Applications of Artificial Intelligence AAAI-15 Video Competition are encouraged to contribute (IAAI-15) will be held January 25-29 at the to the Demonstration Program with their systems, Hyatt Regency Austin in Austin, Texas, USA. AAAI is working October 8 (Papers Due) closely with the local AI community to create opportunities The Senior Member Track provides an opportunity for attendees to experience AI in Texas! Attendees for established researchers in the AI community to can also enjoy nearly 200 music venues that feature give a broad talk on a well-developed body of everything from rock and blues to country and research, an important new research area, or a promising jazz every night of the week. Austin cuisine has new topic. This year, new "Blue Sky Ideas" track expanded from barbecue and Tex-Mex to award-winning is seeking presentations aimed at presenting ideas and inventive international cuisine, and blossomed and visions that can stimulate the research community beyond brick-and-mortar restaurants to a to pursue new directions, such as new problems, vibrant, citywide food truck movement.
Leveraging AI Teaching in the Cloud for AI Teaching on Campus
Fisher, Douglas H. (Vanderbilt University)
The Educational Advances in Artificial Intelligence column discusses and shares innovative educational approaches that teach or leverage AI and its many subfields at all levels of education (K-12, undergraduate, and graduate levels). I credit these positive changes to the active in-class learning and a new enthusiasm for teaching, as well as the first-rate lectures by Stanford professors Jennifer Wisdom and Andrew Ng. I was showed that students liked this SPOC format, although pleased when students, enrolled in Introduction to there were suggestions for better in-class and Artificial Intelligence Class MOOC CS188x at the MOOC-content coordination. Had I tweaked my University of California, Berkeley, came to my channel course and continued along this path, I might have for remediation, taking word back to the MOOC's achieved phenominal success, but sadly I left the discussion forum. I required students in my graduate SPOC format behind.
Statistical Estimation: From Denoising to Sparse Regression and Hidden Cliques
Tramel, Eric W., Kumar, Santhosh, Giurgiu, Andrei, Montanari, Andrea
These notes review six lectures given by Prof. Andrea Montanari on the topic of statistical estimation for linear models. The first two lectures cover the principles of signal recovery from linear measurements in terms of minimax risk. Subsequent lectures demonstrate the application of these principles to several practical problems in science and engineering. Specifically, these topics include denoising of error-laden signals, recovery of compressively sensed signals, reconstruction of low-rank matrices, and also the discovery of hidden cliques within large networks. These are notes from the lecture of Andrea Montanari given at the autumn school "Statistical Physics, Optimization, Inference, and Message-Passing Algorithms", that took place in Les Houches, France from Monday September 30th, 2013, till Friday October 11th, 2013.
$OntoMath^{PRO}$ Ontology: A Linked Data Hub for Mathematics
Nevzorova, Olga, Zhiltsov, Nikita, Kirillovich, Alexander, Lipachev, Evgeny
In this paper, we present an ontology of mathematical knowledge concepts that covers a wide range of the fields of mathematics and introduces a balanced representation between comprehensive and sensible models. We demonstrate the applications of this representation in information extraction, semantic search, and education. We argue that the ontology can be a core of future integration of math-aware data sets in the Web of Data and, therefore, provide mappings onto relevant datasets, such as DBpedia and ScienceWISE.
Acquiring Commonsense Knowledge for Sentiment Analysis through Human Computation
Boia, Marina (รcole Polytechnique Fรฉdรฉrale de Lausanne) | Musat, Claudiu Cristian (รcole Polytechnique Fรฉdรฉrale de Lausanne) | Faltings, Boi (รcole Polytechnique Fรฉdรฉrale de Lausanne)
Many Artificial Intelligence tasks need large amounts of commonsense knowledge. Because obtaining this knowledge through machine learning would require a huge amount of data, a better alternative is to elicit it from people through human computation. We consider the sentiment classification task, where knowledge about the contexts that impact word polarities is crucial, but hard to acquire from data. We describe a novel task design that allows us to crowdsource this knowledge through Amazon Mechanical Turk with high quality. We show that the commonsense knowledge acquired in this way dramatically improves the performance of established sentiment classification methods.
Tree-Based On-Line Reinforcement Learning
Barreto, Andre M. S. (Brazilian National Laboratory for Scientific Computing (LNCC))
Fitted Q-iteration (FQI) stands out among reinforcement learning algorithms for its flexibility and ease of use. FQI can be combined with any regression method, and this choice determines the algorithm's statistical and computational properties. The combination of FQI with an ensemble of regression trees gives rise to an algorithm, FQIT, that is computationally efficient, scalable to high dimensional spaces, and robust to noise. Despite its nice properties and good performance in practice, FQIT also has some limitations: the fact that an ensemble of trees must be constructed (or updated) at each iteration confines the algorithm to the batch scenario. This paper aims to address this specific issue. Based on a strategy recently proposed in the literature, called the stochastic-factorization trick, we propose a modification of FQIT that makes it fully incremental, and thus suitable for on-line learning. We call the resulting method tree-based stochastic factorization (TBSF). We derive upper bounds for the difference between the value functions computed by FQIT and TBSF, and also show in which circumstances the approximations coincide. A series of computational experiments is presented to illustrate the properties of TBSF and to show its usefulness in practice, including a medical problem involving the treatment of patients infected with HIV.