Goto

Collaborating Authors

 Markov Models


Discovering Subgoals in Complex Domains

AAAI Conferences

We present ongoing research to develop novel option discovery methods for complex domains that are represented as Object-Oriented Markov Decision Processes (OO-MDPs) (Diuk, Cohen, and Littman, 2008). We describe Portable Multi-policy Option Discovery for Automated Learning (P-MODAL), an initial framework that extends Pickett and Bartoโ€™s (2002) PolicyBlocks approach to OO-MDPs. We also discuss future work that will use additional representations and techniques to handle scalability and learning challenges.


Affordances as Transferable Knowledge for Planning Agents

AAAI Conferences

Robotic agents often map perceptual input to simplified representations that do not reflect the complexity and richness of the world. This simplification is due in large part to the limitations of planning algorithms, which fail in large stochastic state spaces on account of the well-known "curse of dimensionality." Existing approaches to address this problem fail to prevent autonomous agents from considering many actions which would be obviously irrelevant to a human solving the same problem. We formalize the notion of affordances as knowledge added to an Markov Decision Process (MDP) that prunes actions in a state- and reward- general way. This pruning significantly reduces the number of state-action pairs the agent needs to evaluate in order to act near-optimally. We demonstrate our approach in the Minecraft domain as a model for robotic tasks, showing significant increase in speed and reduction in state-space exploration during planning. Further, we provide a learning framework that enables an agent to learn affordances through experience, opening the door for agents to learn to adapt and plan through new situations. We provide preliminary results indicating that the learning process effectively produces affordances that help solve an MDP faster, suggesting that affordances serve as an effective, transferable piece of knowledge for planning agents in large state spaces.


Learning Human Types from Demonstration

AAAI Conferences

Research on POMDP formulations for collaborative tasks in game AI applications (Nguyen et al. 2011; Macindoe, The development of new industrial robotic systems that operate Kaelbling, and Lozano-Pรฉrez 2012; Silver and Veness in the same physical space as people highlights the 2010) also assumed a known human model. Additionally, emerging need for robots that can integrate seamlessly into previous partially observable formalisms (Ong et al. 2010; human group dynamics by adapting to the personalized style Bandyopadhyay et al. 2013; Broz, Nourbakhsh, and Simmons of human teammates. This adaptation requires learning a statistical 2011; Fern and Tadepalli 2010; Nguyen et al. 2011; model of human behavior and integrating this model Macindoe, Kaelbling, and Lozano-Pรฉrez 2012) in assistive into the decision-making algorithm of the robot in a principled or collaborative tasks represented the preference or intention way. We present a framework for automatically learning of the human for their own actions, rather than those of human user models from joint-action demonstrations the robot, as the partially observable variable.


Building Blocks of Social Intelligence: Enabling Autonomy for Socially Intelligent and Assistive Robots

AAAI Conferences

Vocalics is the study of the nonverbal aspects of speech, such as volume, pitch, and rate. Our contribution is a parametric We present an overview of the control, recognition, decision-making, vocalic behavior controller that autonomously adjusts and learning techniques utilized by the Interaction the robot speaker volume based on models of how a Lab (robotics.usc.edu/interaction) at the University human user will hear speech produced by the robot. These of Southern California (USC) to enable autonomy in sociable models vary with distance, orientation, and perceived environmental and socially assistive robots. These techniques are implemented interference (Mead & Matariฤ‡ 2014). Our future with two software libraries: 1) the Social Behavior work will investigate adapting the pitch and rate of speech Library (SBL) provides autonomous social behavior produced by a robot to improve user speech perception.


Modeling Human-Robot Interactions as Systems of Distributed Cognition

AAAI Conferences

Robots that are integrated into day-to-day settings as assistants, collaborators, and companions will engage in dynamic, physically-situated social interactions with their users. Enabling such interactions will require appropriate models and representations for interaction. In this paper, we argue that the dynamic, physically-situated interactions between humans and robots can be characterized as a system of distributed cognition, that this system can be represented using probabilistic graphical models (PGMs), and that the parameters of these models can be learned from human interactions. We illustrate the application of this perspective in our ongoing research on modeling dyadic referential communication.


Intention-Aware Multi-Human Tracking for Human-Robot Interaction via Particle Filtering over Sets

AAAI Conferences

In order to successfully interact with multiple humans in social situations, an intelligent robot should have the ability to track multi-humans, and understand their motion intentions. We formalize this problem as a hidden Markov model, and estimate the posterior densities by particle filtering over sets approach. Our approach avoids directly performing observation-to-target association by defining a set as a joint state. The human identification problem is then solved in an expectation-maximization way. We evaluate the effectiveness of our approach by both benchamark test and real robot experiments.


Humanoid Robots and Spoken Dialog Systems for Brief Health Interventions

AAAI Conferences

We combined a spoken dialog system that we developed to deliver brief health interventions with the fully autonomous humanoid robot (NAO).ย The dialog system is based on a framework facilitating Markov decision processes (MDP). It is optimized using reinforcement learning (RL) algorithms with data we collected from real user interactions. The system begins to learn optimal dialog strategies for initiative selection and for the type of confirmations that it uses during theinteraction.ย The health intervention, delivered by a 3D character instead of the NAO, has already been evaluated, with positive results in terms of task completion, ease of use, and future intention to use the system. ย The current spoken dialog system for the humanoid robot is a novelty and exists so far as a proof ofconcept.


A Markov Decision Process Framework for Predictable Job Completion Times on Crowdsourcing Platforms

AAAI Conferences

Task starvation leads to huge variation in the completion times of the tasks posted on to the crowd. The price offered to a given task together with the dynamics of the crowd at the time of posting affect its completion time. Large organizations/requesters who frequent the crowd at regular intervals in order to get their tasks done desire predictability in completion times of the tasks. Thus, such requesters have to take into account the crowd dynamics at the time of posting the tasks and price them accordingly. In this work, we study an instance of the pricing problem and propose a solution based on the framework of Markov Decision Processes (MDPs).


Predicting Next Label Quality: A Time-Series Model of Crowdwork

AAAI Conferences

While temporal behavioral patterns can be discerned to underlie real crowd work, prior studies have typically modeled worker performance under a simplified i.i.d. assumption. To better model such temporal worker behavior, we propose a time-series label prediction model for crowd work. This latent variable model captures and summarizes past worker behavior, enabling us to better predict the quality of each worker's next label. Given inherent uncertainty in prediction, we also investigate a decision reject option to balance the tradeoff between prediction accuracy vs. coverage. Results show our model improves accuracy of both label prediction on real crowd worker data, as well as data quality overall.


Parallel Task Routing for Crowdsourcing

AAAI Conferences

An ideal crowdsourcing or citizen-science system would route tasks to the most appropriate workers, but the best assignment is unclear because workers have varying skill, tasks have varying difficulty, and assigning several workers to a single task may significantly improve output quality. This paper defines a space of task routing problems, proves that even the simplest is NP-hard, and develops several approximation algorithms for parallel routing problems. We show that an intuitive class of requesters' utility functions is submodular, which lets us provide iterative methods for dynamically allocating batches of tasks that make near-optimal use of available workers in each round. Experiments with live oDesk workers show that our task routing algorithm uses only 48% of the human labor compared to the commonly used round-robin strategy. Further, we provide versions of our task routing algorithm which enable it to scale to large numbers of workers and questions and to handle workers with variable response times while still providing significant benefit over common baselines.