Planning & Scheduling
D3BA: A Tool for Optimizing Business Processes Using Non-Deterministic Planning
Chakraborti, Tathagata, Khazaeni, Yasaman
This paper builds upon recent work in the declarative design of dialogue agents and proposes an exciting new tool -- D3BA -- Declarative Design for Digital Business Automation, built to optimize business processes using the power of AI planning. The tool provides a powerful framework to build, optimize, and maintain complex business processes and optimize them by composing with services that automate one or more subtasks. We illustrate salient features of this composition technique, compare with other philosophies of composition, and highlight exciting opportunities for research in this emerging field of business process automation.
Artificial Intelligence for Social Good: A Survey
Shi, Zheyuan Ryan, Wang, Claire, Fang, Fei
Its impact is drastic and real: Youtube's AIdriven recommendation system would present sports videos for days if one happens to watch a live baseball game on the platform [1]; email writing becomes much faster with machine learning (ML) based auto-completion [2]; many businesses have adopted natural language processing based chatbots as part of their customer services [3]. AI has also greatly advanced human capabilities in complex decision-making processes ranging from determining how to allocate security resources to protect airports [4] to games such as poker [5] and Go [6]. All such tangible and stunning progress suggests that an "AI summer" is happening. As some put it, "AI is the new electricity" [7]. Meanwhile, in the past decade, an emerging theme in the AI research community is the so-called "AI for social good" (AI4SG): researchers aim at developing AI methods and tools to address problems at the societal level and improve the wellbeing of the society.
2019 in Review: 10 AI Papers That Made an Impact
The volume of peer-reviewed AI research papers has grown by more than 300 percent over the past three decades (Stanford AI Index 2019), and the top AI conferences in 2019 saw a deluge of paper. CVPR submissions spiked to 5,165, a 56 percent increase over 2018; ICLR received 1,591 main conference paper submissions, up 60 percent over last year; ACL reported a record-breaking 2,906 submissions, almost doubling last year's 1,544; and ICCV 2019 received 4,303 submissions, more than twice the 2017 total. As part of our year-end series, Synced spotlights 10 artificial intelligence papers that garnered extraordinary attention and accolades in 2019. Abstract: Finite-horizon lookahead policies are abundantly used in Reinforcement Learning and demonstrate impressive empirical success. Usually, the lookahead policies are implemented with specific planning methods such as Monte Carlo Tree Search (e.g. in AlphaZero).
World Programs for Model-Based Learning and Planning in Compositional State and Action Spaces
Some of the most important tasks take place in environments which lack cheap and perfect simulators, thus hampering the application of model-free reinforcement learning (RL). While model-based RL aims to learn a dynamics model, in a more general case the learner does not know a priori what the action space is. Here we propose a formalism where the learner induces a world program by learning a dynamics model and the actions in graph-based compositional environments by observing state-state transition examples. Then, the learner can perform RL with the world program as the simulator for complex planning tasks. We highlight a recent application, and propose a challenge for the community to assess world program-based planning.
Monte-Carlo Tree Search for Policy Optimization
Ma, Xiaobai, Driggs-Campbell, Katherine, Zhang, Zongzhang, Kochenderfer, Mykel J.
Gradient-based methods are often used for policy optimization in deep reinforcement learning, despite being vulnerable to local optima and saddle points. Although gradient-free methods (e.g., genetic algorithms or evolution strategies) help mitigate these issues, poor initialization and local optima are still concerns in highly nonconvex spaces. This paper presents a method for policy optimization based on Monte-Carlo tree search and gradient-free optimization. Our method, called Monte-Carlo tree search for policy optimization (MCTSPO), provides a better exploration-exploitation trade-off through the use of the upper confidence bound heuristic. We demonstrate improved performance on reinforcement learning tasks with deceptive or sparse reward functions compared to popular gradient-based and deep genetic algorithm baselines.
Stochastic Fairness and Language-Theoretic Fairness in Planning on Nondeterministic Domains
Aminof, Benjamin, De Giacomo, Giuseppe, Rubin, Sasha
We address two central notions of fairness in the literature of planning on nondeterministic fully observable domains. The first, which we call stochastic fairness, is classical, and assumes an environment which operates probabilistically using possibly unknown probabilities. The second, which is language-theoretic, assumes that if an action is taken from a given state infinitely often then all its possible outcomes should appear infinitely often (we call this state-action fairness). While the two notions coincide for standard reachability goals, they diverge for temporally extended goals. This important difference has been overlooked in the planning literature, and we argue has led to confusion in a number of published algorithms which use reductions that were stated for state-action fairness, for which they are incorrect, while being correct for stochastic fairness. We remedy this and provide an optimal sound and complete algorithm for solving state-action fair planning for LTL/LTLf goals, as well as a correct proof of the lower bound of the goal-complexity (our proof is general enough that it provides new proofs also for the no-fairness and stochastic-fairness cases). Overall, we show that stochastic fairness is better behaved than state-action fairness.
Uncertainty-sensitive Learning and Planning with Ensembles
Miłoś, Piotr, Kuciński, Łukasz, Czechowski, Konrad, Kozakowski, Piotr, Klimek, Maciek
We propose a reinforcement learning framework for discrete environments in which an agent makes both strategic and tactical decisions. The former manifests itself through the use of value function, while the latter is powered by a tree search planner. These tools complement each other. The planning module performs a local \textit{what-if} analysis, which allows to avoid tactical pitfalls and boost backups of the value function. The value function, being global in nature, compensates for inherent locality of the planner. In order to further solidify this synergy, we introduce an exploration mechanism with two distinctive components: uncertainty modelling and risk measurement. To model the uncertainty we use value function ensembles, and to reflect risk we use propose several functionals that summarize the implied by the ensemble. We show that our method performs well on hard exploration environments: Deep-sea, toy Montezuma's Revenge, and Sokoban. In all the cases, we obtain speed-up in learning and boost in performance.
Design and Implementation of Linked Planning Domain Definition Language
Tatsubori, Michiaki, Munawar, Asim, Moriyama, Takao
Planning is a critical component of any artificial intelligence system that concerns the realization of strategies or action sequences typically for intelligent agents and autonomous robots. Given predefined parameterized actions, a planning service should accept a query with the goal and initial state to give a solution with a sequence of actions applied to environmental objects. This paper addresses the problem by providing a repository of actions generically applicable to various environmental objects based on Semantic Web technologies. Ontologies are used for asserting constraints in common sense as well as for resolving compatibilities between actions and states. Constraints are defined using Web standards such as SPARQL and SHACL to allow conditional predicates. We demonstrate the usefulness of the proposed planning domain description language with our robotics applications.
Neural-Symbolic Descriptive Action Model from Images: The Search for STRIPS
Not submitted to the 30th International Conference on Automated Planning and SchedulingNeural-Symbolic Descriptive Action Model from Images: The Search for STRIPS Masataro Asai MIT -IBM Watson AI Lab, Cambridge USA IBM Research Abstract Recent work on Neural-Symbolic systems that learn the discrete planning model from images has opened a promising direction for expanding the scope of Automated Planning and Scheduling to the raw, noisy data. However, previous work only partially addressed this problem, utilizing the black-box neural model as the successor generator. In this work, we propose Double-Stage Action Model Acquisition (DSAMA), a system that obtains a descriptive PDDL action model with explicit preconditions and effects over the propositional variables unsupervised-learned from images. DSAMA trains a set of Random Forest rule-based classifiers and compiles them into logical formulae in PDDL. While we obtained a competitively accurate PDDL model compared to a black-box model, we observed that the resulting PDDL is too large and complex for the state-of-the-art standard planners such as Fast Downward primarily due to the PDDL-SAS translator bottleneck. From this negative result, we show that this translator bottleneck cannot be addressed just by using a different, existing rule-based learning method, and we point to the potential future directions. 1 Introduction Recently, Latplan system (Asai and Fukunaga 2018) successfully connected a subsymbolic neural network (NN) system and a symbolic Classical Planning system to solve various visually presented puzzle domains. The system consists of four parts: 1) The State AutoEncoder (SAE) neural network learns a bidirectional mapping between images and propositional states with unsupervised training. The proposed framework opened a promising direction for applying a variety of symbolic methods to the real world -- For example, the search space generated by Latplan was shown to be compatible with a symbolic Goal Recognition system (Amado et al. 2018a; 2018b). Several variations replacing the state encoding modules have also been proposed: Causal InfoGAN (Kurutach et al. 2018) uses a GAN-based framework, First-Order SAE (Asai 2019) learns the First Order Logic symbols (instead of the propositional ones), and Zero-Suppressed SAE (Asai (:action a0:parameters ():precondition [D0]:effect (and (when [E00] (z0)) (when (not [E00]) (not (z0))) (when [E01] (z1)) (when (not [E01]) (not (z1))) ...)) Figure 1: An example DSAMA compilation result for the first action (i.e. Despite these efforts, Latplan is missing a critical feature of the traditional Classical Planning systems: The use of State-of-the-Art heuristic functions.
Qualitative Numeric Planning: Reductions and Complexity
Qualitative numerical planning is classical planning extended with nonnegative real variables that can be increased or decreased "qualitatively", i.e., by positive r andom amounts. While deterministic planning with numerical variables is undecidable in general, qualit ative numerical planning is decidable and provides a convenient abstract model for generaliz ed planning. Qualitative numerical planning, introduced by Srivastava, Zilberstein, Immerman, an d Geffner (2011), showed that solutions to qualitative numerical problems (QNPs) correspond to t he strong cyclic solutions of an associated fully observable non-deterministic (FOND) problem that terminate. The approach leads to a generate-and-test algorithm for solving QNPs where solutions to a FOND problem are generated one by one and tested for termination. The computational shortcomings of this approach, however, are that it is not simple to amend FOND planners to generat e all solutions, and that the number of solutions to check can be doubly exponential in the nu mber of variables. In this work we address these limitations, while providing additional insights o n QNPs. More precisely, we introduce two reductions, one from QNPs to FOND problems and the other from FOND problems to QNPs both of which do not involve termination tests. A result of th ese reductions is that QNPs are shown to have the same expressive power and the same complex ity as FOND problems.