Sohrabi, Shirin
ACPBench: Reasoning about Action, Change, and Planning
Kokel, Harsha, Katz, Michael, Srinivas, Kavitha, Sohrabi, Shirin
There is an increasing body of work using Large Language Models (LLMs) as agents for orchestrating workflows and making decisions in domains that require planning and multi-step reasoning. As a result, it is imperative to evaluate LLMs on core skills required for planning. In this work, we present ACPBench, a benchmark for evaluating the reasoning tasks in the field of planning. The benchmark consists of 7 reasoning tasks over 13 planning domains. The collection is constructed from planning domains described in a formal language. This allows us to synthesize problems with provably correct solutions across many tasks and domains. Further, it allows us the luxury of scale without additional human effort, i.e., many additional problems can be created automatically. Our extensive evaluation of 22 LLMs and OpenAI o1 reasoning models highlights the significant gap in the reasoning capability of the LLMs. Our findings with OpenAI o1, a multi-turn reasoning model, reveal significant gains in performance on multiple-choice questions, yet surprisingly, no notable progress is made on boolean questions. The ACPBench collection is available at https://ibm.github.io/ACPBench.
Thought of Search: Planning with Language Models Through The Lens of Efficiency
Katz, Michael, Kokel, Harsha, Srinivas, Kavitha, Sohrabi, Shirin
Among the most important properties of algorithms investigated in computer science are soundness, completeness, and complexity. These properties, however, are rarely analyzed for the vast collection of recently proposed methods for planning with large language models. In this work, we alleviate this gap. We analyse these properties of using LLMs for planning and highlight that recent trends abandon both soundness and completeness for the sake of inefficiency. We propose a significantly more efficient approach that can, at the same time, maintain both soundness and completeness. We exemplify on four representative search problems, comparing to the LLM-based solutions from the literature that attempt to solve these problems. We show that by using LLMs to produce the code for the search components we can solve the entire datasets with 100\% accuracy with only a few calls to the LLM. We argue for a responsible use of compute resources; urging research community to investigate sound and complete LLM-based approaches that uphold efficiency.
Large Language Models as Planning Domain Generators
Oswald, James, Srinivas, Kavitha, Kokel, Harsha, Lee, Junkyu, Katz, Michael, Sohrabi, Shirin
Developing domain models is one of the few remaining places that require manual human labor in AI planning. Thus, in order to make planning more accessible, it is desirable to automate the process of domain model generation. To this end, we investigate if large language models (LLMs) can be used to generate planning domain models from simple textual descriptions. Specifically, we introduce a framework for automated evaluation of LLM-generated domains by comparing the sets of plans for domain instances. Finally, we perform an empirical analysis of 7 large language models, including coding and chat models across 9 different planning domains, and under three classes of natural language domain descriptions. Our results indicate that LLMs, particularly those with high parameter counts, exhibit a moderate level of proficiency in generating correct planning domains from natural language descriptions. Our code is available at https://github.com/IBM/NL2PDDL.
Some Orders Are Important: Partially Preserving Orders in Top-Quality Planning
Katz, Michael, Lee, Junkyu, Kang, Jungkoo, Sohrabi, Shirin
The ability to generate multiple plans is central to using planning in real-life applications. Top-quality planners generate sets of such top-cost plans, allowing flexibility in determining equivalent ones. In terms of the order between actions in a plan, the literature only considers two extremes -- either all orders are important, making each plan unique, or all orders are unimportant, treating two plans differing only in the order of actions as equivalent. To allow flexibility in selecting important orders, we propose specifying a subset of actions the orders between which are important, interpolating between the top-quality and unordered top-quality planning problems. We explore the ways of adapting partial order reduction search pruning techniques to address this new computational problem and present experimental evaluations demonstrating the benefits of exploiting such techniques in this setting.
Unifying and Certifying Top-Quality Planning
Katz, Michael, Lee, Junkyu, Sohrabi, Shirin
The growing utilization of planning tools in practical scenarios has sparked an interest in generating multiple high-quality plans. Consequently, a range of computational problems under the general umbrella of top-quality planning were introduced over a short time period, each with its own definition. In this work, we show that the existing definitions can be unified into one, based on a dominance relation. The different computational problems, therefore, simply correspond to different dominance relations. Given the unified definition, we can now certify the top-quality of the solutions, leveraging existing certification of unsolvability and optimality. We show that task transformations found in the existing literature can be employed for the efficient certification of various top-quality planning problems and propose a novel transformation to efficiently certify loopless top-quality planning.
Reinforcement Learning for Classical Planning: Viewing Heuristics as Dense Reward Generators
Gehring, Clement, Asai, Masataro, Chitnis, Rohan, Silver, Tom, Kaelbling, Leslie Pack, Sohrabi, Shirin, Katz, Michael
Recent advances in reinforcement learning (RL) have led to a growing interest in applying RL to classical planning domains or applying classical planning methods to some complex RL domains. However, the long-horizon goal-based problems found in classical planning lead to sparse rewards for RL, making direct application inefficient. In this paper, we propose to leverage domain-independent heuristic functions commonly used in the classical planning literature to improve the sample efficiency of RL. These classical heuristics act as dense reward generators to alleviate the sparse-rewards issue and enable our RL agent to learn domain-specific value functions as residuals on these heuristics, making learning easier. Correct application of this technique requires consolidating the discounted metric used in RL and the non-discounted metric used in heuristics. We implement the value functions using Neural Logic Machines, a neural network architecture designed for grounded first-order logic inputs. We demonstrate on several classical planning domains that using classical heuristics for RL allows for good sample efficiency compared to sparse-reward RL. We further show that our learned value functions generalize to novel problem instances in the same domain.
An AI Planning Solution to Scenario Generation for Enterprise Risk Management
Sohrabi, Shirin (IBM T.J. Watson Research Center) | Riabov, Anton V. (IBM T.J. Watson Research Center) | Katz, Michael (IBM T.J. Watson Research Center) | Udrea, Octavian (IBM T.J. Watson Research Center)
Scenario planning is a commonly used method by companies to develop their long-term plans. Scenario planning for risk management puts an added emphasis on identifying and managing emerging risk. While a variety of methods have been proposed for this purpose, we show that applying AI planning techniques to devise possible scenarios provides a unique advantage for scenario planning. Our system, the Scenario Planning Advisor (SPA), takes as input the relevant information from news and social media, representing key risk drivers, as well as the domain knowledge and generates scenarios that explain the key risk drivers and describe the alternative futures. To this end, we provide a characterization of the problem, knowledge engineering methodology, and transformation to planning. Furthermore, we describe the computation of the scenarios, lessons learned, and the feedback received from the pilot deployment of the SPA system in IBM.
Reports of the Workshops of the Thirty-First AAAI Conference on Artificial Intelligence
Anderson, Monica (University of Alabama) | Bartรกk, Roman (Charles University) | Brownstein, John S. (Boston Children's Hospital, Harvard University) | Buckeridge, David L. (McGill University) | Eldardiry, Hoda (Palo Alto Research Center) | Geib, Christopher (Drexel University) | Gini, Maria (University of Minnesota) | Isaksen, Aaron (New York University) | Keren, Sarah (Technion University) | Laddaga, Robert (Vanderbilt University) | Lisy, Viliam (Czech Technical University) | Martin, Rodney (NASA Ames Research Center) | Martinez, David R. (MIT Lincoln Laboratory) | Michalowski, Martin (University of Ottawa) | Michael, Loizos (Open University of Cyprus) | Mirsky, Reuth (Ben-Gurion University) | Nguyen, Thanh (University of Michigan) | Paul, Michael J. (University of Colorado Boulder) | Pontelli, Enrico (New Mexico State University) | Sanner, Scott (University of Toronto) | Shaban-Nejad, Arash (University of Tennessee) | Sinha, Arunesh (University of Michigan) | Sohrabi, Shirin (IBM T. J. Watson Research Center) | Sricharan, Kumar (Palo Alto Research Center) | Srivastava, Biplav (IBM T. J. Watson Research Center) | Stefik, Mark (Palo Alto Research Center) | Streilein, William W. (MIT Lincoln Laboratory) | Sturtevant, Nathan (University of Denver) | Talamadupula, Kartik (IBM T. J. Watson Research Center) | Thielscher, Michael (University of New South Wales) | Togelius, Julian (New York University) | Tran, So Cao (New Mexico State University) | Tran-Thanh, Long (University of Southampton) | Wagner, Neal (MIT Lincoln Laboratory) | Wallace, Byron C. (Northeastern University) | Wilk, Szymon (Poznan University of Technology) | Zhu, Jichen (Drexel University)
Reports of the Workshops of the Thirty-First AAAI Conference on Artificial Intelligence
Anderson, Monica (University of Alabama) | Bartรกk, Roman (Charles University) | Brownstein, John S. (Boston Children's Hospital, Harvard University) | Buckeridge, David L. (McGill University) | Eldardiry, Hoda (Palo Alto Research Center) | Geib, Christopher (Drexel University) | Gini, Maria (University of Minnesota) | Isaksen, Aaron (New York University) | Keren, Sarah (Technion University) | Laddaga, Robert (Vanderbilt University) | Lisy, Viliam (Czech Technical University) | Martin, Rodney (NASA Ames Research Center) | Martinez, David R. (MIT Lincoln Laboratory) | Michalowski, Martin (University of Ottawa) | Michael, Loizos (Open University of Cyprus) | Mirsky, Reuth (Ben-Gurion University) | Nguyen, Thanh (University of Michigan) | Paul, Michael J. (University of Colorado Boulder) | Pontelli, Enrico (New Mexico State University) | Sanner, Scott (University of Toronto) | Shaban-Nejad, Arash (University of Tennessee) | Sinha, Arunesh (University of Michigan) | Sohrabi, Shirin (IBM T. J. Watson Research Center) | Sricharan, Kumar (Palo Alto Research Center) | Srivastava, Biplav (IBM T. J. Watson Research Center) | Stefik, Mark (Palo Alto Research Center) | Streilein, William W. (MIT Lincoln Laboratory) | Sturtevant, Nathan (University of Denver) | Talamadupula, Kartik (IBM T. J. Watson Research Center) | Thielscher, Michael (University of New South Wales) | Togelius, Julian (New York University) | Tran, So Cao (New Mexico State University) | Tran-Thanh, Long (University of Southampton) | Wagner, Neal (MIT Lincoln Laboratory) | Wallace, Byron C. (Northeastern University) | Wilk, Szymon (Poznan University of Technology) | Zhu, Jichen (Drexel University)
The AAAI-17 workshop program included 17 workshops covering a wide range of topics in AI. Workshops were held Sunday and Monday, February 4-5, 2017 at the Hilton San Francisco Union Square in San Francisco, California, USA. This report contains summaries of 12 of the workshops, and brief abstracts of the remaining 5
State Projection via AI Planning
Sohrabi, Shirin (IBM T. J. Watson Research Center) | Riabov, Anton V. (IBM T. J. Watson Research Center) | Udrea, Octavian (IBM T. J. Watson Research Center)
Imagining the future helps anticipate and prepare for what is coming. This has great importance to many, if not all, human endeavors. In this paper, we develop the Planning Projector system prototype, which applies plan-recognition-as-planning technique to both explain the observations derived from analyzing relevant news and social media, and project a range of possible future state trajectories for human review. Unlike the plan recognition problem, where a set of goals, and often a plan library must be given as part of the input, the Planning Projector system takes as input the domain knowledge, a sequence of observations derived from the news, a time horizon, and the number of trajectories to produce. It then computes the set of trajectories by applying a planner capable of finding a set of high-quality plans on a transformed planning problem. The Planning Projector prototype integrates several components including: (1) knowledge engineering: the process of encoding the domain knowledge from domain experts; (2) data transformation: the problem of analyzing and transforming the raw data into a sequence of observations; (3) trajectory computation: characterizing the future state projection problem and computing a set of trajectories; (4) user interface: clustering and visualizing the trajectories. We evaluate our approach qualitatively and conclude that the Planning Projector helps users understand future possibilities so that they can make more informed decisions.