Undirected Networks
Covering Number as a Complexity Measure for POMDP Planning and Learning
Zhang, Zongzhang (University of Science and Technology of China) | Littman, Michael (Rutgers University) | Chen, Xiaoping (University of Science and Technology of China)
Finding a meaningful way of characterizing the difficulty of partially observable Markov decision processes (POMDPs) is a core theoretical problem in POMDP research. State-space size is often used as a proxy for POMDP difficulty, but it is a weak metric at best. Existing work has shown that the covering number for the reachable belief space, which is a set of belief points that are reachable from the initial belief point, has interesting links with the complexity of POMDP planning, theoretically. In this paper, we present empirical evidence that the covering number for the reachable belief space (or just ``covering number", for brevity) is a far better complexity measure than the state-space size for both planning and learning POMDPs on several small-scale benchmark problems. We connect the covering number to the complexity of learning POMDPs by proposing a provably convergent learning algorithm for POMDPs without reset given knowledge of the covering number.
Counting-MLNs: Learning Relational Structure for Decision Making
Nath, Aniruddh (University of Washington) | Richardson, Matthew (Microsoft Research)
Many first-order probabilistic models can be represented much more compactly using aggregation operations such as counting. While traditional statistical relational representations share factors across sets of interchangeable random variables, representations that explicitly model aggregations also exploit interchangeability of random variables within factors. This is especially useful in decision making settings, where an agent might need to reason about counts of the different types of objects it interacts with. Previous work on counting formulas in statistical relational representations has mostly focused on the problem of exact inference on an existing model. The problem of learning such models is largely unexplored. In this paper, we introduce Counting Markov Logic Networks (C-MLNs), an extension of Markov logic networks that can compactly represent complex counting formulas. We present a structure learning algorithm for C-MLNs; we apply this algorithm to the novel problem of generalizing natural language instructions, and to relational reinforcement learning in the Crossblock domain, in which standard MLN learning algorithms fail to find any useful structure. The C-MLN policies learned from natural language instructions are compact and intuitive, and, despite requiring no instructions on test games, win 20% more Crossblock games than a state-of-the-art algorithm for following natural language instructions.
Dynamically Switching between Synergistic Workflows for Crowdsourcing
Lin, Christopher H (University of Washington) | Mausam, . (University of Washington) | Weld, Daniel S (University of Washington)
To ensure quality results from unreliable crowdsourced workers, task designers often construct complex workflows and aggregate worker responses from redundant runs. Frequently, they create several alternative workflows to accomplish the task, and choose a single workflow to deploy (perhaps the one that achieves the best performance during early experiments). However, this seemingly natural design paradigm does not achieve the full potential of crowdsourcing. In particular, using a single workflow (even the best) to accomplish a task is suboptimal. We show that alternative workflows can compose synergistically to yield a much higher quality output. We formalize the insight with a novel probabilistic graphical model, design and implement AgentHunt, a POMDP-based controller that dynamically switches between these workflows to achieve higher returns on investment, and design offline and online methods for learning model parameters. Live experiments on Amazon Mechanical Turk demonstrate the superiority of AgentHunt for the practical task of generating NLP training data, yielding up to 50% error reduction and greater net utility compared to previous methods.
Teaching Localization in Probabilistic Robotics
Martin, Fred G. (University of Massachusetts Lowell) | Dalphond, James (University of Massachusetts Lowell) | Tuck, Nat (University of Massachusetts Lowell)
In the field of probabilistic robotics, a central problem is to determine a robot's state given knowledge of a time series of control commands and sensor readings. The effects of control commands and the behavior of sensor devices are both modeled probabilistically. A variety of methods are available for deriving the robot's belief state, which is a probabilistic representation of the robot's true state (which cannot be directly known). This paper presents a series of five assignments to teach this material at the advanced undergraduate/graduate level. The theoretical aspect of the work is reinforced by practical implementation exercises using ROS (Robot Operating System), and the Bilibot, an educational robot platform.
Combining Probabilistic Planning and Logic Programming on Mobile Robots
Zhang, Shiqi (Texas Tech University) | Bao, Forrest Sheng (Texas Tech University) | Sridharan, Mohan (Texas Tech University)
Key challenges to widespread deployment of mobile robots to interact with humans in real-world domains include the ability to: (a) robustly represent and revise domain knowledge; (b) autonomously adapt sensing and processing to the task at hand; and (c) learn from unreliable high-level human feedback. Partially observable Markov decision processes (POMDPs) have been used to plan sensing and navigation in different application domains. It is however a challenge to include common sense knowledge obtained from sensory or human inputs in POMDPs. In addition, information extracted from sensory and human inputs may have varying levels of relevance to current and future tasks. On the other hand, although a non-monotonic logic programming paradigm such as Answer Set Programming (ASP) is wellsuited for common sense reasoning, it is unable to model the uncertainty in real-world sensing and navigation (Gelfond 2008). This paper presents a hybrid framework that integrates ASP, hierarchical POMDPs (Zhang and Sridharan 2012) and psychophysics principles to address the challenges stated above. Experimental results in simulation and on mobile robots deployed in indoor domains show that the framework results in reliable and efficient operation.
A Testbed for Learning by Demonstration from Natural Language and RGB-Depth Video
Song, Young Chol (University of Rochester) | Kautz, Henry (University of Rochester)
We are developing a testbed for learning by demonstration combining spoken language and sensor data in a natural real-world environment. Microsoft Kinect RGB-Depth cameras allow us to infer high-level visual features, such as the relative position of objects in space, with greater precision and less training than required by traditional systems. Speech is recognized and parsed using a “deep” parsing system, so that language features are available at the word, syntactic, and semantic levels. We collected an initial data set of 10 episodes of 7 individuals demonstrating how to “make tea”, and created a “gold standard” hand annotation of the actions performed in each. Finally, we are constructing “baseline” HMM-based activity recognition models using the visual and language features, in order to be ready to evaluate the performance of our future work on deeper and more structured models.
Informed Initial Policies for Learning in Dec-POMDPs
Kraemer, Landon Jeffrey (The University of Southern Mississippi) | Banerjee, Bikramjit (The University of Southern Mississippi)
Decentralized partially observable Markov decision processes (Dec-POMDPs) offer a formal model for planning in cooperative multiagent systems where agents operate with noisy sensors and actuators, and local information. Prevalent Dec-POMDP solution techniques have mostly been centralized and have assumed knowledge of the model. In real world scenarios, however, solving centrally may not be an option and model parameters maybe unknown. To address this, we propose a distributed, model-free algorithm for learning Dec-POMDP policies, in which agents take turns learning, with each agent not currently learning following a static policy. For agents that have not yet learned a policy, this static policy must be initialized. We propose a principled method for learning such initial policies through interaction with the environment. We show that by using such informed initial policies, our alternate learning algorithm can find near-optimal policies for two benchmark problems.
Delivering the Smart Grid: Challenges for Autonomous Agents and Multi-Agent Systems Research
Rogers, Alex (University of Southampton) | Ramchurn, Sarvapali D. (University of Southampton) | Jennings, Nicholas R. (University of Southampton)
Restructuring electricity grids to meet the increased demand caused by the electrification of transport and heating, while making greater use of intermittent renewable energy sources, represents one of the greatest engineering challenges of our day. This modern electricity grid, in which both electricity and information flow in two directions between large numbers of widely distributed suppliers and generators — commonly termed the ‘smart grid’ — represents a radical reengineering of infrastructure which has changed little over the last hundred years. However, the autonomous behaviour expected of the smart grid, its distributed nature, and the existence of multiple stakeholders each with their own incentives and interests, challenges existing engineering approaches. In this challenge paper, we describe why we believe that artificial intelligence, and particularly, the fields of autonomous agents and multi-agent systems are essential for delivering the smart grid as it is envisioned. We present some recent work in this area and describe many of the challenges that still remain.
Goal Recognition with Markov Logic Networks for Player-Adaptive Games
Ha, Eun Y. (North Carolina State University) | Rowe, Jonathan P. (North Carolina State University) | Mott, Bradford W. (North Carolina State University) | Lester, James C. (North Carolina State University)
Goal recognition in digital games involves inferring players’ goals from observed sequences of low-level player actions. Goal recognition models support player-adaptive digital games, which dynamically augment game events in response to player choices for a range of applications, including entertainment, training, and education. However, digital games pose significant challenges for goal recognition, such as exploratory actions and ill-defined goals. This paper presents a goal recognition framework based on Markov logic networks (MLNs). The model’s parameters are directly learned from a corpus that was collected from player interactions with a non-linear educational game. An empirical evaluation demonstrates that the MLN goal recognition framework accurately predicts players’ goals in a game environment with exploratory actions and ill-defined goals.
Occupancy Grid Models for Robot Mapping in Changing Environments
Meyer-Delius, Daniel (KUKA Laboratories GmbH) | Beinhofer, Maximilian (University of Freiburg) | Burgard, Wolfram (University of Freiburg)
The majority of existing approaches to mobile robot mapping assumes that the world is static, which is generally not justified in real-world applications. However, in many navigation tasks including trajectory planning, surveillance, and coverage, accurate maps are essential for the effective behavior of the robot. In this paper we present a probabilistic grid-based approach for modeling changing environments. Our method represents both, the occupancy and its changes in the corresponding area where the dynamics are characterized by the state transition probabilities of a Hidden Markov Model. We apply an offline and an online technique to learn the parameters from observed data. The advantage of the online approach is that it can dynamically adapt the parameters and at the same time does not require storing the complete observation sequences. Experimental results obtained with data acquired by real robots demonstrate that our model is well-suited for representing changing environments. Further results show that our technique can be used to substantially improve the effectiveness of path planning procedures.