Goto

Collaborating Authors

 Education


Learning User Preferences to Incentivize Exploration in the Sharing Economy

AAAI Conferences

We study platforms in the sharing economy and discuss the need for incentivizing users to explore options that otherwise would not be chosen. For instance, rental platforms such as Airbnb typically rely on customer reviews to provide users with relevant information about different options. Yet, often a large fraction of options does not have any reviews available. Such options are frequently neglected as viable choices, and in turn are unlikely to be evaluated, creating a vicious cycle. Platforms can engage users to deviate from their preferred choice by offering monetary incentives for choosing a different option instead. To efficiently learn the optimal incentives to offer, we consider structural information in user preferences and introduce a novel algorithm---Coordinated Online Learning (CoOL)---for learning with structural information modeled as convex constraints. We provide formal guarantees on the performance of our algorithm and test the viability of our approach in a user study with data of apartments on Airbnb. Our findings suggest that our approach is well-suited to learn appropriate incentives and increase exploration on the investigated platform.


Behavior Is Everything: Towards Representing Concepts with Sensorimotor Contingencies

AAAI Conferences

AI has seen remarkable progress in recent years, due to a switch from hand-designed shallow representations, to learned deep representations. While these methods excel with plentiful training data, they are still far from the human ability to learn concepts from just a few examples by reusing previously learned conceptual knowledge in new contexts. We argue that this gap might come from a fundamental misalignment between human and typical AI representations: while the former are grounded in rich sensorimotor experience, the latter are typically passive and limited to a few modalities such as vision and text. We take a step towards closing this gap by proposing an interactive, behavior-based model that represents concepts using sensorimotor contingencies grounded in an agent's experience. On a novel conceptual learning and benchmark suite, we demonstrate that conceptually meaningful behaviors can be learned, given supervision via training curricula.


Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces

AAAI Conferences

While recent advances in deep reinforcement learning have allowed autonomous learning agents to succeed at a variety of complex tasks, existing algorithms generally require a lot oftraining data. One way to increase the speed at which agent sare able to learn to perform tasks is by leveraging the input of human trainers. Although such input can take many forms, real-time, scalar-valued feedback is especially useful in situations where it proves difficult or impossible for humans to provide expert demonstrations. Previous approaches have shown the usefulness of human input provided in this fashion (e.g., the TAMER framework), but they have thus far not considered high-dimensional state spaces or employed the use of deep learning. In this paper, we do both: we propose DeepTAMER, an extension of the TAMER framework that leverages the representational power of deep neural networks inorder to learn complex tasks in just a short amount of time with a human trainer. We demonstrate Deep TAMERโ€™s success by using it and just 15 minutes of human-provided feedback to train an agent that performs better than humans on the Atari game of Bowling - a task that has proven difficult for even state-of-the-art reinforcement learning methods.


Optimizing Interventions via Offline Policy Evaluation: Studies in Citizen Science

AAAI Conferences

Volunteers who help with online crowdsourcing such as citizen science tasks typically make only a few contributions before exiting. We propose a computational approach for increasing users' engagement in such settings that is based on optimizing policies for displaying motivational messages to users. The approach, which we refer to as Trajectory Corrected Intervention (TCI), reasons about the tradeoff between the long-term influence of engagement messages on participants' contributions and the potential risk of disrupting their current work. We combine model-based reinforcement learning with off-line policy evaluation to generate intervention policies, without relying on a fixed representation of the domain. TCI works iteratively to learn the best representation from a set of random intervention trials and to generate candidate intervention policies. It is able to refine selected policies off-line by exploiting the fact that users can only be interrupted once per session.We implemented TCI in the wild with Galaxy Zoo, one of the largest citizen science platforms on the web. We found that TCI was able to outperform the state-of-the-art intervention policy for this domain, and significantly increased the contributions of thousands of users. This work demonstrates the benefit of combining traditional AI planning with off-line policy methods to generate intelligent intervention strategies.


Emergence of Grounded Compositional Language in Multi-Agent Populations

AAAI Conferences

By capturing statistical patterns in large corpora, machine learning has enabled significant advances in natural language processing, including in machine translation, question answering, and sentiment analysis. However, for agents to intelligently interact with humans, simply capturing the statistical patterns is insufficient. In this paper we investigate if, and how, grounded compositional language can emerge as a means to achieve goals in multi-agent populations. Towards this end, we propose a multi-agent learning environment and learning methods that bring about emergence of a basic compositional language. This language is represented as streams of abstract discrete symbols uttered by agents over time, but nonetheless has a coherent structure that possesses a defined vocabulary and syntax. We also observe emergence of non-verbal communication such as pointing and guiding when language communication is unavailable.


Interactively Learning a Blend of Goal-Based and Procedural Tasks

AAAI Conferences

Agents that can learn new tasks through interactive instruction can utilize goal information to search for and learn flexible policies. This approach can be resilient to variations in initial conditions or issues that arise during execution. However, if a task is not easily formulated as achieving a goal or if the agent lacks sufficient domain knowledge for planning, other methods are required. We present a hybrid approach to interactive task learning that can learn both goal-oriented and procedural tasks, and mixtures of the two, from human natural language instruction. We describe this approach, go through two examples of learning tasks, and outline the space of tasks that the system can learn. We show that our approach can learn a variety of goal-oriented and procedural tasks from a single example and is robust to different amounts of domain knowledge.


An Interactive Multi-Label Consensus Labeling Model for Multiple Labeler Judgments

AAAI Conferences

Multi-label classification is crucial to several practical applications including document categorization, video tagging, targeted advertising etc. Training a multi-label classifier requires a large amount of labeled data which is often unavailable or scarce. Labeled data is then acquired by consulting multiple labelers---both human and machine. Inspired by ensemble methods, our premise is that labels inferred with high consensus among labelers, might be closer to the ground truth. We propose strategies based on interaction and active learning to obtain higher quality labels that potentially lead to greater consensus. We propose a novel formulation that aims to collectively optimize the cost of labeling, labeler reliability, label-label correlation and inter-labeler consensus. Evaluation on data labeled by multiple labelers (both human and machine) shows that our consensus output is closer to the ground truth when compared to the "majority" baseline. We present illustrative cases where it even improves over the existing ground truth. We also present active learning strategies to leverage our consensus model in interactive learning settings. Experiments on several real-world datasets (publicly available) demonstrate the efficacy of our approach in achieving promising classification results with fewer labeled data.


Coalition Manipulation of Gale-Shapley Algorithm

AAAI Conferences

It is well-known that the Gale-Shapley algorithm is not truthful for all agents. Previous studies in this category concentrate on manipulations using incomplete preference lists by a single woman and by the set of all women. Little is known about manipulations by a subset of women. In this paper, we consider manipulations by any subset of women with arbitrary preferences. We show that a strong Nash equilibrium of the induced manipulation game always exists among the manipulators and the equilibrium outcome is unique and Pareto-dominant. In addition, the set of matchings achievable by manipulations has a lattice structure. We also examine the super-strong Nash equilibrium in the end.


Optimal Spot-Checking for Improving Evaluation Accuracy of Peer Grading Systems

AAAI Conferences

Peer grading, allowing students/peers to evaluate others' assignments, offers a promising solution for scaling evaluation and learning to large-scale educational systems. A key challenge in peer grading is motivating peers to grade diligently. While existing spot-checking (SC) mechanisms can prevent peer collusion where peers coordinate to report the uninformative grade, they unrealistically assume that peers have the same grading reliability and cost. This paper studies the general Optimal Spot-Checking (OptSC) problem of determining the probability each assignment needs to be checked to maximize assignments' evaluation accuracy aggregated from peers, and takes into consideration 1) peers' heterogeneous characteristics, and 2) peers' strategic grading behaviors to maximize their own utility. We prove that the bilevel OptSC is NP-hard to solve. By exploiting peers' grading behaviors, we first formulate a single level relaxation to approximate OptSC. By further exploiting structural properties of the relaxed problem, we propose an efficient algorithm to that relaxation, which also gives a good approximation of the original OptSC. Extensive experiments on both synthetic and real datasets show significant advantages of the proposed algorithm over existing approaches.


Variational BOLT: Approximate Learning in Factorial Hidden Markov Models With Application to Energy Disaggregation

AAAI Conferences

The learning problem for Factorial Hidden Markov Models with discrete and multi-variate latent variables remains a challenge. Inference of the latent variables required for the E-step of Expectation Minimization algorithms is usually computationally intractable. In this paper we propose a variational learning algorithm mimicking the Baum-Welch algorithm. By approximating the filtering distribution with a variational distribution parameterized by a recurrent neural network, the computational complexity of the learning problem as a function of the number of hidden states can be reduced to quasilinear instead of quadratic time as required by traditional algorithms such as Baum-Welch whilst making minimal independence assumptions. We evaluate the performance of the resulting algorithm, which we call Variational BOLT, in the context of unsupervised end-to-end energy disaggregation. We conduct experiments on the publicly available REDD dataset and show competitive results when compared with a supervised inference approach and state-of-the-art results in an unsupervised setting.