Amiri, Saeid
DKPROMPT: Domain Knowledge Prompting Vision-Language Models for Open-World Planning
Zhang, Xiaohan, Altaweel, Zainab, Hayamizu, Yohei, Ding, Yan, Amiri, Saeid, Yang, Hao, Kaminski, Andy, Esselink, Chad, Zhang, Shiqi
Prompting foundation models such as large language models (LLMs) and vision-language models (VLMs) requires extensive domain knowledge and manual efforts, resulting in the so-called "prompt engineering" problem. To improve the performance of foundation models, one can provide examples explicitly [1] or implicitly [2], or encourage intermediate reasoning steps [3, 4]. Despite all the efforts, their performance in long-horizon reasoning tasks is still limited. Classical planning methods, including those defined by Planning Domain Definition Language (PDDL), are strong in ensuring the soundness, completeness and efficiency in planning tasks [5]. However, those classical planners rely on predefined states and actions, and do not perform well in open-world scenarios. We aim to enjoy the openness of VLMs in scene understanding while retaining the strong long-horizon reasoning capabilities of classical planners. Our key idea is to extract domain knowledge from classical planners for prompting VLMs towards enabling classical planners that are visually grounded and responsive to open-world situations. Given the natural connection between planning symbols and human language, this paper investigates how pre-trained VLMs can assist the robot in realizing symbolic plans generated by classical planners, while avoiding the engineering efforts of checking the outcomes of each action.
Surrogate Assisted Monte Carlo Tree Search in Combinatorial Optimization
Amiri, Saeid, Zehtabi, Parisa, Dervovic, Danial, Cashmore, Michael
Industries frequently adjust their facilities network by opening new branches in promising areas and closing branches in areas where they expect low profits. In this paper, we examine a particular class of facility location problems. Our objective is to minimize the loss of sales resulting from the removal of several retail stores. However, estimating sales accurately is expensive and time-consuming. To overcome this challenge, we leverage Monte Carlo Tree Search (MCTS) assisted by a surrogate model that computes evaluations faster. Results suggest that MCTS supported by a fast surrogate function can generate solutions faster while maintaining a consistent solution compared to MCTS that does not benefit from the surrogate function.
Integrating Action Knowledge and LLMs for Task Planning and Situation Handling in Open Worlds
Ding, Yan, Zhang, Xiaohan, Amiri, Saeid, Cao, Nieqing, Yang, Hao, Kaminski, Andy, Esselink, Chad, Zhang, Shiqi
Task planning systems have been developed to help robots use human knowledge (about actions) to complete long-horizon tasks. Most of them have been developed for "closed worlds" while assuming the robot is provided with complete world knowledge. However, the real world is generally open, and the robots frequently encounter unforeseen situations that can potentially break the planner's completeness. Could we leverage the recent advances on pre-trained Large Language Models (LLMs) to enable classical planning systems to deal with novel situations? This paper introduces a novel framework, called COWP, for open-world task planning and situation handling. COWP dynamically augments the robot's action knowledge, including the preconditions and effects of actions, with task-oriented commonsense knowledge. COWP embraces the openness from LLMs, and is grounded to specific domains via action knowledge. For systematic evaluations, we collected a dataset that includes 1,085 execution-time situations. Each situation corresponds to a state instance wherein a robot is potentially unable to complete a task using a solution that normally works. Experimental results show that our approach outperforms competitive baselines from the literature in the success rate of service tasks. Additionally, we have demonstrated COWP using a mobile manipulator. Supplementary materials are available at: https://cowplanning.github.io/
Grounding Classical Task Planners via Vision-Language Models
Zhang, Xiaohan, Ding, Yan, Amiri, Saeid, Yang, Hao, Kaminski, Andy, Esselink, Chad, Zhang, Shiqi
Classical planning systems have shown great advances in utilizing rule-based human knowledge to compute accurate plans for service robots, but they face challenges due to the strong assumptions of perfect perception and action executions. To tackle these challenges, one solution is to connect the symbolic states and actions generated by classical planners to the robot's sensory observations, thus closing the perception-action loop. This research proposes a visually-grounded planning framework, named TPVQA, which leverages Vision-Language Models (VLMs) to detect action failures and verify action affordances towards enabling successful plan execution. Results from quantitative experiments show that TPVQA surpasses competitive baselines from previous studies in task completion rate.
Tractable Approximate Gaussian Inference for Bayesian Neural Networks
Goulet, James-A., Nguyen, Luong Ha, Amiri, Saeid
In this paper, we propose an analytical method for performing tractable approximate Gaussian inference (TAGI) in Bayesian neural networks. The method enables the analytical Gaussian inference of the posterior mean vector and diagonal covariance matrix for weights and biases. The method proposed has a computational complexity of $\mathcal{O}(n)$ with respect to the number of parameters $n$, and the tests performed on regression and classification benchmarks confirm that, for a same network architecture, it matches the performance of existing methods relying on gradient backpropagation.
Robot Sequential Decision Making using LSTM-based Learning and Logical-probabilistic Reasoning
Amiri, Saeid, Shirazi, Mohammad Shokrolah, Zhang, Shiqi
Sequential decision-making (SDM) plays a key role in intelligent robotics, and can be realized in very different ways, such as supervised learning, automated reasoning, and probabilistic planning. The three families of methods follow different assumptions and have different (dis)advantages. In this work, we aim at a robot SDM framework that exploits the complementary features of learning, reasoning, and planning. We utilize long short-term memory (LSTM), for passive state estimation with streaming sensor data, and commonsense reasoning and probabilistic planning (CORPP) for active information collection and task accomplishment. In experiments, a mobile robot is tasked with estimating human intentions using their motion trajectories, declarative contextual knowledge, and human-robot interaction (dialog-based and motion-based). Results suggest that our framework performs better than its no-learning and no-reasoning versions in a real-world office environment.
Robot Behavioral Exploration and Multi-modal Perception using Dynamically Constructed Controllers
Amiri, Saeid (Cleveland State University) | Wei, Suhua (Cleveland State University) | Zhang, Shiqi (Cleveland State University) | Sinapov, Jivko (Tufts University) | Thomason, Jesse (The University of Texas at Austin) | Stone, Peter (The University of Texas at Austin)
Intelligent robots frequently need to explore the objects in their working environments. Modern sensors have enabled robots to learn object properties via perception of multiple modalities. However, object exploration in the real world poses a challenging trade-off between information gains and exploration action costs. Mixed observability Markov decision process (MOMDP) is a framework for planning under uncertainty, while accounting for both fully and partially observable components of the state. Robot perception frequently has to face such mixed observability. This work enables a robot equipped with an arm to dynamically construct query-oriented MOMDPs for object exploration. The robot’s behavioral policy is learned from two datasets collected using real robots. Our approach enables a robot to explore object properties in a way that is significantly faster while improving accuracies in comparison to existing methods that rely on hand-coded exploration strategies.
A General Hybrid Clustering Technique
Amiri, Saeid, Clarke, Bertrand, Clarke, Jennifer, Koepke, Hoyt A.
Here, we propose a clustering technique for general clustering problems including those that have non-convex clusters. For a given desired number of clusters $K$, we use three stages to find a clustering. The first stage uses a hybrid clustering technique to produce a series of clusterings of various sizes (randomly selected). They key steps are to find a $K$-means clustering using $K_\ell$ clusters where $K_\ell \gg K$ and then joins these small clusters by using single linkage clustering. The second stage stabilizes the result of stage one by reclustering via the `membership matrix' under Hamming distance to generate a dendrogram. The third stage is to cut the dendrogram to get $K^*$ clusters where $K^* \geq K$ and then prune back to $K$ to give a final clustering. A variant on our technique also gives a reasonable estimate for $K_T$, the true number of clusters. We provide a series of arguments to justify the steps in the stages of our methods and we provide numerous examples involving real and simulated data to compare our technique with other related techniques.