Markov Models
Saul: Towards Declarative Learning Based Programming
Kordjamshidi, Parisa (University of Illinois at Urbana-Champaign) | Roth, Dan (University of Illinois at Urbana-Champaign) | Wu, Hao (University of Illinois at Urbana-Champaign)
We present Saul, a new probabilistic programming language designed to address some of the shortcomings of programming languages that aim at advancing and simplifying the development of AI systems. Such languages need to interact with messy, naturally occurring data, to allow a programmer to specify what needs to be done at an appropriate level of abstraction rather than at the data level, to be developed on a solid theory that supports moving to and reasoning at this level of abstraction and, finally, to support flexible integration of these learning and inference models within an application program. Saul is an object-functional programming language written in Scala that facilitates these by (1) allowing a programmer to learn, name and manipulate named abstractions over relational data; (2) supporting seamless incorporation of trainable (probabilistic or discriminative) components into the program, and (3) providing a level of inference over trainable models to support composition and make decisions that respect domain and application constraints. Saul is developed over a declaratively defined relational data model, can use piecewise learned factor graphs with declaratively specified learning and inference objectives, and it supports inference over probabilistic models augmented with declarative knowledge-based constraints.We describe the key constructs of Saul and exemplify its use in developing applications that require relational feature engineering and structured output prediction.
Multi-Objective POMDPs with Lexicographic Reward Preferences
Wray, Kyle Hollins (University of Massachusetts, Amherst) | Zilberstein, Shlomo (University of Massachusetts, Amherst)
We propose a model, Lexicographic Partially Observable Markov Decision Process (LPOMDP), which extends POMDPs with lexicographic preferences over multiple value functions. It allows for slack--slightly less-than-optimal values--for higher-priority preferences to facilitate improvement in lower-priority value functions. Many real life situations are naturally captured by LPOMDPs with slack. We consider a semi-autonomous driving scenario in which time spent on the road is minimized, while maximizing time spent driving autonomously. We propose two solutions to LPOMDPs--Lexicographic Value Iteration (LVI) and Lexicographic Point-Based Value Iteration (LPBVI), establishing convergence results and correctness within strong slack bounds. We test the algorithms using real-world road data provided by Open Street Map (OSM) within 10 major cities. Finally, we present GPU-based optimizations for point-based solvers, demonstrating that their application enables us to quickly solve vastly larger LPOMDPs and other variations of POMDPs.
Planning for Stochastic Games with Co-Safe Objectives
Song, Lei (University of Technology Sydney) | Feng, Yuan (University of Technology Sydney) | Zhang, Lijun (Chinese Academy of Sciences)
We consider planning problems for stochastic games with objectives specified by a branching-time logic, called probabilistic computation tree logic (PCTL). This problem has been shown to be undecidable if strategies with perfect recall, i.e., history-dependent, are considered. In this paper, we show that, if restricted to co-safe properties, a subset of PCTL properties capable to specify a wide range of properties in practice including reachability ones, the problem turns to be decidable, even when the class of general strategies is considered. We also give an algorithm for solving robust stochastic planning, where a winning strategy is tolerant to some perturbations of probabilities in the model. Our result indicates that satisfiability of co-safe PCTL is decidable as well.
Point-Based Planning for Multi-Objective POMDPs
Roijers, Diederik Marijn (University of Amsterdam) | Whiteson, Shimon (University of Amsterdam) | Oliehoek, Frans A. (University of Liverpool)
Many sequential decision-making problems require an agent to reason about both multiple objectives and uncertainty regarding the environment's state. Such problems can be naturally modelled as multi-objective partially observable Markov decision processes (MOPOMDPs). We propose optimistic linear support with alpha reuse (OLSAR), which computes a bounded approximation of the optimal solution set for all possible weightings of the objectives. The main idea is to solve a series of scalarized single-objective POMDPs, each corresponding to a different weighting of the objectives. A key insight underlying OLSAR is that the policies and value functions produced when solving scalarized POMDPs in earlier iterations can be reused to more quickly solve scalarized POMDPs in later iterations. We show experimentally that OLSAR outperforms, both in terms of runtime and approximation quality, alternative methods and a variant of OLSAR that does not leverage reuse.
Factored Upper Bounds for Multiagent Planning Problems under Uncertainty with Non-Factored Value Functions
Oliehoek, Frans Adriaan (University of Amsterdam and University of Liverpool) | Spaan, Matthijs T. J. (Delft University of Technology) | Witwicki, Stefan John (Swiss Federal Institute of Technology (EPFL))
Nowadays, multiagent planning under uncertainty scales to tens or even hundreds of agents. However, current methods either are restricted to problems with factored value functions, or provide solutions without any guarantees on quality. Methods in the former category typically build on heuristic search using upper bounds on the value function. Unfortunately, no techniques exist to compute such upper bounds for problems with non-factored value functions, which would additionally allow for meaningful benchmarking of methods of the latter category. To mitigate this problem, this paper introduces a family of influence-optimistic upper bounds for factored Dec-POMDPs without factored value functions. We demonstrate how we can achieve firm quality guarantees for problems with hundreds of agents.
Metareasoning for Planning Under Uncertainty
Lin, Christopher H. (University of Washington) | Kolobov, Andrey (Microsoft Research) | Kamar, Ece (Microsoft Research) | Horvitz, Eric (Microsoft Research)
The conventional model for online planning under uncertainty assumes that an agent can stop and plan without incurring costs for the time spent planning. However, planning time is not free in most real-world settings. For example, an autonomous drone is subject to nature's forces, like gravity, even while it thinks, and must either pay a price for counteracting these forces to stay in place, or grapple with the state change caused by acquiescing to them. Policy optimization in these settings requires metareasoning---a process that trades off the cost of planning and the potential policy improvement that can be achieved. We formalize and analyze the metareasoning problem for Markov Decision Processes (MDPs). Our work subsumes previously studied special cases of metareasoning and shows that in the general case, metareasoning is at most polynomially harder than solving MDPs with any given algorithm that disregards the cost of thinking. For reasons we discuss, optimal general metareasoning turns out to be impractical, motivating approximations. We present approximate metareasoning procedures which rely on special properties of the BRTDP planning algorithm and explore the effectiveness of our methods on a variety of problems.
Probabilistic Knowledge-Based Programs
Lang, Jérôme (CNRS, Université Paris-Dauphine) | Zanuttini, Bruno (Université de Caen Basse-Normandie)
We introduce Probabilistic Knowledge-Based Programs (PKBPs), a new, compact representation of policies for factored partially observable Markov decision processes. PKBPs use branching conditions such as if the probability of φ is larger than p, and many more. While similar in spirit to value-based policies, PKBPs leverage the factored representation for more compactness. They also cope with more general goals than standard state-based rewards, such as pure information-gathering goals. Compactness comes at the price of reactivity, since evaluating branching conditions on-line is not polynomial in general. In this sense, PKBPs are complementary to other representations. Our intended application is as a tool for experts to specify policies in a natural, compact language, then have them verified automatically. We study succinctness and the complexity of verification for PKBPs.
Optimal Policy Generation for Partially Satisfiable Co-Safe LTL Specifications
Lacerda, Bruno (University of Birmingham) | Parker, David (University of Birmingham) | Hawes, Nick (University of Birmingham)
We present a method to calculate cost-optimal policies for co-safe linear temporal logic task specifications over a Markov decision process model of a stochastic system. Our key contribution is to address scenarios in which the task may not be achievable with probability one. We formalise a task progression metric and, using multi-objective probabilistic model checking, generate policies that are formally guaranteed to, in decreasing order of priority: maximise the probability of finishing the task; maximise progress towards completion, if this is not possible; and minimise the expected time or cost required. We illustrate and evaluate our approach in a robot task planning scenario, where the task is to visit a set of rooms that may be inaccessible during execution.
ASAP-UCT: Abstraction of State-Action Pairs in UCT
Anand, Ankit (Indian Institute of Technology, Delhi) | Grover, Aditya (Indian Institute of Technology, Delhi) | ., Mausam (Indian Institute of Technology, Delhi) | Singla, Parag (Indian Institute of Technology, Delhi)
Monte-Carlo Tree Search (MCTS) algorithms such as UCT are an attractive online framework for solving planning under uncertainty problems modeled as a Markov Decision Process. However, MCTS search trees are constructed in flat state and action spaces, which can lead to poor policies for large problems. In a separate research thread, domain abstraction techniques compute symmetries to reduce the original MDP. This can lead to significant savings in computation, but these have been predominantly implemented for offline planning. This paper makes two contributions. First, we define the ASAP (Abstraction of State-Action Pairs) framework, which extends and unifies past work on domain abstractions by holistically aggregating both states and state-action pairs — ASAP uncovers a much larger number of symmetries in a given domain. Second, we propose ASAP-UCT, which implements ASAP-style abstractions within a UCT framework combining strengths of online planning with domain abstractions. Experimental evaluation on several benchmark domains shows up to 26% improvement in the quality of policies obtained over existing algorithms.
An Ontology Matching Approach Based on Affinity-Preserving Random Walks
Xiang, Chuncheng (Peking University) | Chang, Baobao (Peking University) | Sui, Zhifang (Peking University)
Ontology matching is the process of finding semantic correspondences between entities from different ontologies. As an effective solution to linking different heterogeneous ontologies, ontology matching has attracted considerable attentions in recent years. In this paper, we propose a novel graph-based approach to ontology matching problem. Different from previous work, we formulate ontology matching as a random walk process on the association graph constructed from the to-be-matched ontologies. In particular, two variants of the conventional random walk process, namely, Affinity-Preserving Random Walk (APRW) and Mapping-Oriented Random Walk (MORW), have been proposed to alleviate the adverse effect of the false-mapping nodes in the association graph and to incorporate the 1-to-1 matching constraints presumed in ontology matching, respectively. Experiments on the Ontology Alignment Evaluation Initiative (OAEI) datasets show that our approach achieves a competitive performance when compared with state-of-the-art systems, even though our approach does not utilize any external resources.