Collaborating Authors

Crowdclass: Designing Classification-Based Citizen Science Learning Modules

AAAI Conferences

In this paper, we introduce Crowdclass, a novel framework that integrates the learning of advanced scientific concepts with the crowdsourcing microtask of image classification. In Crowdclass, we design questions to serve as both a learning experience and a scientific classification. This is different from conventional citizen science platforms which decompose high level questions into a series of simple microtasks that require no scientific background knowledge to complete. We facilitate learning within the microtask by providing content that is appropriate for the participant’s level of knowledge through scaffolding learning. We conduct a between-group study of 93 participants on Amazon Mechanical Turk comparing Crowdclass to the popular citizen science project Galaxy Zoo. We find that the scaffolding presentation of content enables learning of more challenging concepts. By understanding the relationship between user motivation, learning, and performance, we draw general design principles for learning-as-an-incentive interventions applicable to other crowdsourcing applications.

Derivative-Free Optimization via Classification

AAAI Conferences

Many randomized heuristic derivative-free optimization methods share a framework that iteratively learns a model for promising search areas and samples solutions from the model. This paper studies a particular setting of such framework, where the model is implemented by a classification model discriminating good solutions from bad ones. This setting allows a general theoretical characterization, where critical factors to the optimization are discovered. We also prove that optimization problems with Local Lipschitz continuity can be solved in polynomial time by proper configurations of this framework. Following the critical factors, we propose the randomized coordinate shrinking classification algorithm to learn the model, forming the RACOS algorithm, for optimization in continuous and discrete domains. Experiments on the testing functions as well as on the machine learning tasks including spectral clustering and classification with Ramp loss demonstrate the effectiveness of RACOS.

Introducing core concepts of recommendation systems


Discover how to use Python--and some essential machine learning concepts--to build programs that can make recommendations. She helps you learn the concepts behind how recommendation systems work by taking you through a series of examples and exercises. Once you're familiar with the underlying concepts, Lillian explains how to apply statistical and machine learning methods to construct your own recommenders. She demonstrates how to build a popularity-based recommender using the Pandas library, how to recommend similar items based on correlation, and how to deploy various machine learning algorithms to make recommendations. At the end of the course, she shows how to evaluate which recommender performed the best.

Classification-Based Machine Learning for Finance


Finally, a comprehensive hands-on machine learning course with specific focus on classification based models for the investment community and passionate investors. In the past few years, there has been a massive adoption and growth in the use of data science, artificial intelligence and machine learning to find alpha. However, information on and application of machine learning to investment are scarce. This course has been designed to address that. It is meant to spark your creative juices and get you started in this space.

A Bayesian Approach for Accurate Classification-Based Aggregates Machine Learning

In this paper, we study the accuracy of values aggregated over classes predicted by a classification algorithm. The problem is that the resulting aggregates (e.g., sums of a variable) are known to be biased. The bias can be large even for highly accurate classification algorithms, in particular when dealing with class-imbalanced data. To correct this bias, the algorithm's classification error rates have to be estimated. In this estimation, two issues arise when applying existing bias correction methods. First, inaccuracies in estimating classification error rates have to be taken into account. Second, impermissible estimates, such as a negative estimate for a positive value, have to be dismissed. We show that both issues are relevant in applications where the true labels are known only for a small set of data points. We propose a novel bias correction method using Bayesian inference. The novelty of our method is that it imposes constraints on the model parameters. We show that our method solves the problem of biased classification-based aggregates as well as the two issues above, in the general setting of multi-class classification. In the empirical evaluation, using a binary classifier on a real-world dataset of company tax returns, we show that our method outperforms existing methods in terms of mean squared error.