Learning Graphical Models
Point-Based Planning for Multi-Objective POMDPs
Roijers, Diederik Marijn (University of Amsterdam) | Whiteson, Shimon (University of Amsterdam) | Oliehoek, Frans A. (University of Liverpool)
Many sequential decision-making problems require an agent to reason about both multiple objectives and uncertainty regarding the environment's state. Such problems can be naturally modelled as multi-objective partially observable Markov decision processes (MOPOMDPs). We propose optimistic linear support with alpha reuse (OLSAR), which computes a bounded approximation of the optimal solution set for all possible weightings of the objectives. The main idea is to solve a series of scalarized single-objective POMDPs, each corresponding to a different weighting of the objectives. A key insight underlying OLSAR is that the policies and value functions produced when solving scalarized POMDPs in earlier iterations can be reused to more quickly solve scalarized POMDPs in later iterations. We show experimentally that OLSAR outperforms, both in terms of runtime and approximation quality, alternative methods and a variant of OLSAR that does not leverage reuse.
Factored Upper Bounds for Multiagent Planning Problems under Uncertainty with Non-Factored Value Functions
Oliehoek, Frans Adriaan (University of Amsterdam and University of Liverpool) | Spaan, Matthijs T. J. (Delft University of Technology) | Witwicki, Stefan John (Swiss Federal Institute of Technology (EPFL))
Nowadays, multiagent planning under uncertainty scales to tens or even hundreds of agents. However, current methods either are restricted to problems with factored value functions, or provide solutions without any guarantees on quality. Methods in the former category typically build on heuristic search using upper bounds on the value function. Unfortunately, no techniques exist to compute such upper bounds for problems with non-factored value functions, which would additionally allow for meaningful benchmarking of methods of the latter category. To mitigate this problem, this paper introduces a family of influence-optimistic upper bounds for factored Dec-POMDPs without factored value functions. We demonstrate how we can achieve firm quality guarantees for problems with hundreds of agents.
Action2Activity: Recognizing Complex Activities from Sensor Data
Liu, Ye (National University of Singapore) | Nie, Liqiang (National University of Singapore) | Han, Lei (Hong Kong Baptist University) | Zhang, Luming (National University of Singapore) | Rosenblum, David S. (National University of Singapore)
As compared to simple actions, activities are much more complex, but semantically consistent with a human's real life. Techniques for action recognition from sensor generated data are mature. However, there has been relatively little work on bridging the gap between actions and activities. To this end, this paper presents a novel approach for complex activity recognition comprising of two components. The first component is temporal pattern mining, which provides a mid-level feature representation for activities, encodes temporal relatedness among actions, and captures the intrinsic properties of activities. The second component is adaptive Multi-Task Learning, which captures relatedness among activities and selects discriminant features. Extensive experiments on a real-world dataset demonstrate the effectiveness of our work.
Metareasoning for Planning Under Uncertainty
Lin, Christopher H. (University of Washington) | Kolobov, Andrey (Microsoft Research) | Kamar, Ece (Microsoft Research) | Horvitz, Eric (Microsoft Research)
The conventional model for online planning under uncertainty assumes that an agent can stop and plan without incurring costs for the time spent planning. However, planning time is not free in most real-world settings. For example, an autonomous drone is subject to nature's forces, like gravity, even while it thinks, and must either pay a price for counteracting these forces to stay in place, or grapple with the state change caused by acquiescing to them. Policy optimization in these settings requires metareasoning---a process that trades off the cost of planning and the potential policy improvement that can be achieved. We formalize and analyze the metareasoning problem for Markov Decision Processes (MDPs). Our work subsumes previously studied special cases of metareasoning and shows that in the general case, metareasoning is at most polynomially harder than solving MDPs with any given algorithm that disregards the cost of thinking. For reasons we discuss, optimal general metareasoning turns out to be impractical, motivating approximations. We present approximate metareasoning procedures which rely on special properties of the BRTDP planning algorithm and explore the effectiveness of our methods on a variety of problems.
Probabilistic Knowledge-Based Programs
Lang, Jérôme (CNRS, Université Paris-Dauphine) | Zanuttini, Bruno (Université de Caen Basse-Normandie)
We introduce Probabilistic Knowledge-Based Programs (PKBPs), a new, compact representation of policies for factored partially observable Markov decision processes. PKBPs use branching conditions such as if the probability of φ is larger than p, and many more. While similar in spirit to value-based policies, PKBPs leverage the factored representation for more compactness. They also cope with more general goals than standard state-based rewards, such as pure information-gathering goals. Compactness comes at the price of reactivity, since evaluating branching conditions on-line is not polynomial in general. In this sense, PKBPs are complementary to other representations. Our intended application is as a tool for experts to specify policies in a natural, compact language, then have them verified automatically. We study succinctness and the complexity of verification for PKBPs.
Optimal Policy Generation for Partially Satisfiable Co-Safe LTL Specifications
Lacerda, Bruno (University of Birmingham) | Parker, David (University of Birmingham) | Hawes, Nick (University of Birmingham)
We present a method to calculate cost-optimal policies for co-safe linear temporal logic task specifications over a Markov decision process model of a stochastic system. Our key contribution is to address scenarios in which the task may not be achievable with probability one. We formalise a task progression metric and, using multi-objective probabilistic model checking, generate policies that are formally guaranteed to, in decreasing order of priority: maximise the probability of finishing the task; maximise progress towards completion, if this is not possible; and minimise the expected time or cost required. We illustrate and evaluate our approach in a robot task planning scenario, where the task is to visit a set of rooms that may be inaccessible during execution.
Estimating the Probability of Meeting a Deadline in Hierarchical Plans
Cohen, Liat (Ben Gurion University of the Negev) | Shimony, Solomon Eyal (Ben Gurion University of the Negev) | Weiss, Gera (Ben Gurion University of the Negev)
Given a hierarchical plan (or schedule) with uncertain task times, we may need to determine the probability that a given plan will satisfy a given deadline. This problem is shown to be NP-hard for series-parallel hierarchies. We provide a polynomial-time approximation algorithm for it. Computing the expected makespan of an hierarchical plan is also shown to be NP-hard. We examine the approximation bounds empirically and demonstrate where our scheme is superior to sampling and to exact computation.
ASAP-UCT: Abstraction of State-Action Pairs in UCT
Anand, Ankit (Indian Institute of Technology, Delhi) | Grover, Aditya (Indian Institute of Technology, Delhi) | ., Mausam (Indian Institute of Technology, Delhi) | Singla, Parag (Indian Institute of Technology, Delhi)
Monte-Carlo Tree Search (MCTS) algorithms such as UCT are an attractive online framework for solving planning under uncertainty problems modeled as a Markov Decision Process. However, MCTS search trees are constructed in flat state and action spaces, which can lead to poor policies for large problems. In a separate research thread, domain abstraction techniques compute symmetries to reduce the original MDP. This can lead to significant savings in computation, but these have been predominantly implemented for offline planning. This paper makes two contributions. First, we define the ASAP (Abstraction of State-Action Pairs) framework, which extends and unifies past work on domain abstractions by holistically aggregating both states and state-action pairs — ASAP uncovers a much larger number of symmetries in a given domain. Second, we propose ASAP-UCT, which implements ASAP-style abstractions within a UCT framework combining strengths of online planning with domain abstractions. Experimental evaluation on several benchmark domains shows up to 26% improvement in the quality of policies obtained over existing algorithms.
An Ontology Matching Approach Based on Affinity-Preserving Random Walks
Xiang, Chuncheng (Peking University) | Chang, Baobao (Peking University) | Sui, Zhifang (Peking University)
Ontology matching is the process of finding semantic correspondences between entities from different ontologies. As an effective solution to linking different heterogeneous ontologies, ontology matching has attracted considerable attentions in recent years. In this paper, we propose a novel graph-based approach to ontology matching problem. Different from previous work, we formulate ontology matching as a random walk process on the association graph constructed from the to-be-matched ontologies. In particular, two variants of the conventional random walk process, namely, Affinity-Preserving Random Walk (APRW) and Mapping-Oriented Random Walk (MORW), have been proposed to alleviate the adverse effect of the false-mapping nodes in the association graph and to incorporate the 1-to-1 matching constraints presumed in ontology matching, respectively. Experiments on the Ontology Alignment Evaluation Initiative (OAEI) datasets show that our approach achieves a competitive performance when compared with state-of-the-art systems, even though our approach does not utilize any external resources.
On Conceptual Labeling of a Bag of Words
Sun, Xiangyan (Fudan University) | Xiao, Yanghua (Fudan University) | Wang, Haixun (Google Research) | Wang, Wei (Fudan University)
In natural language processing and information retrieval, the bag of words representation is used to implicitly represent the meaning of the text. Implicit semantics, however, are insufficient in supporting text or natural language based interfaces, which are adopted by an increasing number of applications. Indeed, in applications ranging from automatic ontology construction to question answering, explicit representation of semantics is starting to play a more prominent role. In this paper, we introduce the task of conceptual labeling (CL), which aims at generating a minimum set of conceptual labels that best summarize a bag of words. We draw the labels from a data driven semantic network that contains millions of highly connected concepts. The semantic network provides meaning to the concepts, and in turn, it provides meaning to the bag of words through the conceptual labels we generate. To achieve our goal, we use an information theoretic approach to trade-off the semantic coverage of a bag of words against the minimality of the output labels. Specifically, we use Minimum Description Length (MDL) as the criteria in selecting the best concepts. Our extensive experimental results demonstrate the effectiveness of our approach in representing the explicit semantics of a bag of words.