Goto

Collaborating Authors

 Markov Models


Learning Accuracy and Availability of Humans Who Help Mobile Robots

AAAI Conferences

When mobile robots perform tasks in environments with humans, it seems appropriate for the robots to rely on such humans for help instead of dedicated human oracles or supervisors. However, these humans are not always available nor always accurate. In this work, we consider human help to a robot as concretely providing observations about the robot's state to reduce state uncertainty as it executes its policy autonomously. We model the probability of receiving an observation from a human in terms of their availability and accuracy by introducing Human Observation Providers POMDPs (HOP-POMDPs). We contribute an algorithm to learn human availability and accuracy online while the robot is executing its current task policy. We demonstrate that our algorithmis effective in approximating the true availability and accuracy of humans without depending on oracles to learn, thus increasing the tractability of deploying a robot that can occasionally ask for help.


A POMDP-Based Optimal Control of P300-Based Brain-Computer Interfaces

AAAI Conferences

Most of the previous work on brain-computer interfaces (BCIs) exploiting the P300 in electroencephalography (EEG) has focused on low-level signal processing algorithms such as feature extraction and classification methods. Although a significant improvement has been made in the past, the accuracy of detecting P300 is limited by the inherently low signal-to-noise ratio in EEGs. In this paper, we present a systematic approach to optimize the interface using partially observable Markov decision processes (POMDPs). Through experiments involving human subjects, we show the P300 speller system that is optimized using the POMDP achieves a significant performance improvement in terms of the communication bandwidth in the interaction.


Learned Behaviors of Multiple Autonomous Agents in Smart Grid Markets

AAAI Conferences

One proposed approach to managing a large complex Smart Grid is through Broker Agents who buy electrical power from distributed producers, and also sell power to consumers, via a Tariff Market--a new market mechanism where Broker Agents publish concurrent bid and ask prices. A key challenge is the specification of the market strategy that the Broker Agents should use in order to earn profits while maintaining the market's balance of supply and demand. Interestingly, previous work has shown that a Broker Agent can learn its strategy, using Markov Decision Processes (MDPs) and Q-learning, and outperform other Broker Agents that use predetermined or randomized strategies. In this work, we investigate the more representative scenario in which multiple Broker Agents, instead of a single one, are independently learning their strategies. Using a simulation environment based on real data, we find that Broker Agents who employ periodic increases in exploration achieve higher rewards. We also find that varying levels of market dominance in customer allocation models result in remarkably distinct outcomes in market prices and aggregate Broker Agent rewards. The latter set of results can be explained by established economic principles regarding the emergence of monopolies in market-based competition, further validating our approach.


Linear Dynamic Programs for Resource Management

AAAI Conferences

Sustainable resource management in many domains presents large continuous stochastic optimization problems, which can often be modeled as Markov decision processes (MDPs). To solve such large MDPs, we identify and leverage linearity in state and action sets that is common in resource management. In particular, we introduce linear dynamic programs (LDPs) that generalize resource management problems and partially observable MDPs (POMDPs). We show that the LDP framework makes it possible to adapt point-based methods--the state of the art in solving POMDPs--to solving LDPs. The experimental results demonstrate the efficiency of this approach in managing the water level of a river reservoir. Finally, we discuss the relationship with dual dynamic programming, a method used to optimize hydroelectric systems.


Green Driver: AI in a Microcosm

AAAI Conferences

The Green Driver app is a dynamic routing application for GPS-enabled smartphones. Green Driver combines client GPS data with real-time traffic light information provided by cities to determine optimal routes in response to driver route requests. Routes are optimized with respect to travel time, with the intention of saving the driver both time and fuel, and rerouting can occur if warranted. During a routing session, client phones communicate with a centralized server that both collects GPS data and processes route requests. All relevant data are anonymized and saved to databases for analysis; statistics are calculated from the aggregate data and fed back to the routing engine to improve future routing. Analyses can also be performed to discern driver trends: where do drivers tend to go, how long do they stay, when and where does traffic congestion occur, and so on. The system uses a number of techniques from the field of artificial intelligence. We apply a variant of A* search for solving the stochastic shortest path problem in order to find optimal driving routes through a network of roads given light-status information. We also use dynamic programming and hidden Markov models to determine the progress of a driver through a network of roads from GPS data and light-status data. The Green Driver system is currently deployed for testing in Eugene, Oregon, and is scheduled for large-scale deployment in Portland, Oregon, in Spring 2011.


Artificial Intelligence for Artificial Artificial Intelligence

AAAI Conferences

Crowdsourcing platforms such as Amazon Mechanical Turk have become popular for a wide variety of human intelligence tasks; however, quality control continues to be a significant challenge. Recently, we propose TurKontrol, a theoretical model based on POMDPs to optimize iterative, crowd-sourced workflows. However, they neither describe how to learn the model parameters, nor show its effectiveness in a real crowd-sourced setting. Learning is challenging due to the scale of the model and noisy data: there are hundreds of thousands of workers with high-variance abilities. This paper presents an end-to-end system that first learns TurKontrol's POMDP parameters from real Mechanical Turk data, and then applies the model to dynamically optimize live tasks. We validate the model and use it to control a successive-improvement process on Mechanical Turk. By modeling worker accuracy and voting patterns, our system produces significantly superior artifacts compared to those generated through nonadaptive workflows using the same amount of money.


Abductive Markov Logic for Plan Recognition

AAAI Conferences

Plan recognition is a form of abductive reasoning that involves inferring plans that best explain sets of observed actions. Most existing approaches to plan recognition and other abductive tasks employ either purely logical methods that donot handle uncertainty, or purely probabilistic methods thatdo not handle structured representations. To overcome these limitations, this paper introduces an approach to abductive reasoning using a ๏ฌrst-order probabilistic logic, speci๏ฌcally Markov Logic Networks (MLNs). It introduces several novel techniques for making MLNs ef๏ฌcient and effective for abduction. Experiments on three plan recognition datasets showthe bene๏ฌt of our approach over existing methods.


Recognizing Plans with Loops Represented in a Lexicalized Grammar

AAAI Conferences

This paper extends existing plan recognition research to handle plans containing loops. We supply an encoding of plans with loops for recognition, based on techniques used to parse lexicalized grammars, and demonstrate its effectiveness empirically. To do this, the paper first shows how encoding plan libraries as context free grammars permits the application of standard rewriting techniques to remove left recursion and ฮต-productions, thereby enabling polynomial time parsing. However, these techniques alone fail to provide efficient algorithms for plan recognition. We show how the loop-handling methods from formal grammars can be extended to the more general plan recognition problem and provide a method for encoding loops in an existing plan recognition system that scales linearly in the number of loop iterations.


A POMDP Model of Eye-Hand Coordination

AAAI Conferences

This paper presents a generative model of eye-hand coordination. We use numerical optimization to solve for the joint behavior of an eye and two hands, deriving a predicted motion pattern from first principles, without imposing heuristics. We model the planar scene as a POMDP with 17 continuous state dimensions. Belief-space optimization is facilitated by using a nominal-belief heuristic, whereby we assume (during planning) that the maximum likelihood observation is always obtained. Since a globally-optimal solution for such a high-dimensional domain is computationally intractable, we employ local optimization in the belief domain. By solving for a locally-optimal plan through belief space, we generate a motion pattern of mutual coordination between hands and eye: the eye's saccades disambiguate the scene in a task-relevant manner, and the hands' motions anticipate the eye's saccades. Finally, the model is validated through a behavioral experiment, in which human subjects perform the same eye-hand coordination task. We show how simulation is congruent with the experimental results.


Coordinated Multi-Agent Reinforcement Learning in Networked Distributed POMDPs

AAAI Conferences

In many multi-agent applications such as distributed sensor nets, a network of agents act collaboratively under uncertainty and local interactions. Networked Distributed POMDP (ND-POMDP) provides a framework to model such cooperative multi-agent decision making. Existing work on ND-POMDPs has focused on offline techniques that require accurate models, which are usually costly to obtain in practice. This paper presents a model-free, scalable learning approach that synthesizes multi-agent reinforcement learning (MARL) and distributed constraint optimization (DCOP). By exploiting structured interaction in ND-POMDPs, our approach distributes the learning of the joint policy and employs DCOP techniques to coordinate distributed learning to ensure the global learning performance. Our approach can learn a globally optimal policy for ND-POMDPs with a property called groupwise observability. Experimental results show that, with communication during learning and execution, our approach significantly outperforms the nearly-optimal non-communication policies computed offline.