Goto

Collaborating Authors

 Markov Models


Markov Games of Incomplete Information for Multi-Agent Reinforcement Learning

AAAI Conferences

Partially observable stochastic games (POSGs) are an attractive model for many multi-agent domains, but are computationally extremely difficult to solve. We present a new model, Markov games of incomplete information (MGII) which imposes a mild restriction on POSGs while overcoming their primary computational bottleneck. Finally we show how to convert a MGII into a continuous but bounded fully observable stochastic game. MGIIs represents the most general tractable model for multi-agent reinforcement learning to date.


Human Intelligence Needs Artificial Intelligence

AAAI Conferences

Crowdsourcing platforms, such as Amazon Mechanical Turk, have enabled the construction of scalable applications for tasks ranging from product categorization and photo tagging to audio transcription and translation. These vertical applications are typically realized with complex, self-managing workflows that guarantee quality results. But constructing such workflows is challenging, with a huge number of alternative decisions for the designer to consider. We argue the thesis that โ€œArtificial intelligence methods can greatly simplify the process of creating and managing complex crowdsourced workflows.โ€ We present the design of CLOWDER, which uses machine learning to continually refine models of worker performance and task difficulty. Using these models, CLOWDER uses decision-theoretic optimization to 1) choose between alternative workflows, 2) optimize parameters for a workflow, 3) create personalized interfaces for individual workers, and 4) dynamically control the workflow. Preliminary experience suggests that these optimized workflows are significantly more economical (and return higher quality output) than those generated by humans.


Human Activity Detection from RGBD Images

AAAI Conferences

Being able to detect and recognize human activities is important for making personal assistant robots useful in performing assistive tasks. The challenge is to develop a system that is low-cost, reliable in unstructured home settings, and also straightforward to use. In this paper, we use a RGBD sensor (Microsoft Kinect) as the input sensor, and present learning algorithms to infer the activities. Our algorithm is based on a hierarchical maximum entropy Markov model (MEMM). It considers a person's activity as composed of a set of sub-activities, and infers the two-layered graph structure using a dynamic programming approach. We test our algorithm on detecting and recognizing twelve different activities performed by four people in different environments, such as a kitchen, a living room, an office, etc., and achieve an average performance of 84.3% when the person was seen before in the training set (and 64.2% when the person was not seen before).


InfoMax Control for Acoustic Exploration of Objects by a Mobile Robot

AAAI Conferences

Recently, information gain has been proposed as a candidate intrinsic motivation for lifelong learning agents that may not always have a specific task. ย In the InfoMax control framework, reinforcement learning is used to find a control policy for a POMDP in which movement and sensing actions are selected to reduce Shannon entropy as quickly as possible. In this study, we implement InfoMax control on a robot which can move between objects and perform sound-producing manipulations on them. ย We formulate a novel latent variable mixture model for acoustic similarities and learn InfoMax polices that allow the robot to rapidly reduce uncertainty about the categories of the objects in a room. We find that InfoMax with our improved acoustic model leads to policies which lead to high classification accuracy. ย Interestingly, we also find that with an insufficient model, the InfoMax policy eventually learns to "bury its head in the sand" to avoid getting additional evidence that might increase uncertainty. ย We discuss the implications of this finding for InfoMax as a principle of intrinsic motivation in lifelong learning agents.


Visual Search and Multirobot Collaboration Based on Hierarchical Planning

AAAI Conferences

Mobile robots are increasingly being used in the real-world due to the availability of high-fidelity sensors and sophisticated information processing algorithms. A key challenge to the widespread deployment of robots is the ability to accurately sense the environment and collaborate towards a common objective. Probabilistic sequential decision-making methods can be used to address this challenge because they encapsulate the partial observability and non-determinism of robot domains. However, such formulations soon become intractable for domains with complex state spaces that require real-time operation. Our prior work enabled a mobile robot to use hierarchical partially observable Markov decision processes (POMDPs) to automatically tailor visual sensing and information processing to the task at hand. This paper introduces adaptive observation functions and policy re-weighting in a three-layered POMDP hierarchy to enable reliable and efficient visual processing in dynamic domains. In addition, each robot merges its beliefs with those communicated by teammates, to enable a team of robots to collaborate robustly. All algorithms are evaluated in simulated domains and on physical robots tasked with locating target objects in indoor environments.


When Did You Start Doing that Thing that You Do? Interactive Activity Recognition and Prompting

AAAI Conferences

We present a model of interactive activity recognition and prompting for use in an assistive system for persons with cognitive disabilities. The system can determine the userโ€™s state by interpreting sensor data and/or by explicitly querying the user, and can prompt the user to begin or end tasks. The objective of the system is to help the user maintain a daily schedule of activities while minimizing interruptions from questions or prompts. The model is built upon an option-based hierarchical POMDP. Options can be programmed and customized to specify complex routines for prompting or questioning. Novel aspects of the model include (1) the introduction of adaptive options, which employ a lightweight user model and are able to provide near-optimal performance with little exploration; (2) a restricted-inquiry dual-control algorithm that can appeal for help from the user when sensor data is ambiguous; and (3) a combined filtering / most likely-sequence algorithm for activities determining the beginning and ending time points of the userโ€™s activities. Experiments show that each of these features contributes to the robustness of the model.


Mobile, Collaborative, Context-Aware Systems

AAAI Conferences

We describe work on representing and using a rich notion ofcontext that goes beyond current networking applications focusingmostly on location. Our context model includes locationand surroundings, the presence of people and devices,inferred activities and the roles people fill in them. A keyelement of our work is the use of collaborative informationsharing where devices share and integrate knowledge abouttheir context. This introduces a requirement that users canset appropriate levels of privacy to protect the personal informationbeing collected and the inferences that can be drawnfrom it. We use Semantic Web technologies to model contextand to specify high-level, declarative policies specifying informationsharing constraints. The policies involve attributesof the subject (i.e., information recipient), target (i.e., the information)and their dynamic context (e.g., are the parties copresent).We discuss our ongoing work on context representationand inference and present a model for protecting andcontrolling the sharing of private data in context-aware mobileapplications.


Policy Gradient Planning for Environmental Decision Making with Existing Simulators

AAAI Conferences

In environmental and natural resource planning domains actions are taken at a large number of locations over multiple time periods. These problems have enormous state and action spaces, spatial correlation between actions, uncertainty and complex utility models. We present an approach for modeling these planning problems as factored Markov decision processes. The reward model can contain local and global components as well as spatial constraints between locations. The transition dynamics can be provided by existing simulators developed by domain experts. We propose a landscape policy defined as the equilibrium distribution of a Markov chain built from many locally-parameterized policies. This policy is optimized using a policy gradient algorithm. Experiments using a forestry simulator demonstrate the algorithm's ability to devise policies for sustainable harvest planning of a forest.


Planning for Operational Control Systems with Predictable Exogenous Events

AAAI Conferences

Various operational control systems (OCS) are naturally modeled as Markov Decision Processes. OCS often enjoy access to predictions of future events that have substantial impact on their operations. For example, reliable forecasts of extreme weather conditions are widely available, and such events can affect typical request patterns for customer response management systems, the flight and service time of airplanes, or the supply and demand patterns for electricity. The space of exogenous events impacting OCS can be very large, prohibiting their modeling within the MDP; moreover, for many of these exogenous events there is no useful predictive, probabilistic model. Realtime predictions, however, possibly with a short lead-time, are often available. In this work we motivate a model which combines offline MDP infinite horizon planning with realtime adjustments given specific predictions of future exogenous events, and suggest a framework in which such predictions are captured and trigger real-time planning problems. We propose a number of variants of existing MDP solution algorithms, adapted to this context, and evaluate them empirically.


An Event-Based Framework for Process Inference

AAAI Conferences

We focus on a class of models used for representing the dynamics between a discrete set of probabilistic events in a continuous-time setting. The proposed framework offers tractable learning and inference procedures and provides compact state representations for processes which exhibit variable delays between events. The approach is applied to a heart sound labeling task that exhibits long-range dependencies on previous events, and in which explicit modeling of the rhythm timings is justifiable by cardiological principles.