Problem Solving
Guessing human intentions to avoid dangerous situations in caregiving robots
Zapata, Noé, Pérez, Gerardo, Bonilla, Lucas, Núñez, Pedro, Bachiller, Pilar, Bustos, Pablo
For robots to interact socially, they must interpret human intentions and anticipate their potential outcomes accurately. This is particularly important for social robots designed for human care, which may face potentially dangerous situations for people, such as unseen obstacles in their way, that should be avoided. This paper explores the Artificial Theory of Mind (ATM) approach to inferring and interpreting human intentions. We propose an algorithm that detects risky situations for humans, selecting a robot action that removes the danger in real time. We use the simulation-based approach to ATM and adopt the 'like-me' policy to assign intentions and actions to people. Using this strategy, the robot can detect and act with a high rate of success under time-constrained situations. The algorithm has been implemented as part of an existing robotics cognitive architecture and tested in simulation scenarios. Three experiments have been conducted to test the implementation's robustness, precision and real-time response, including a simulated scenario, a human-in-the-loop hybrid configuration and a real-world scenario.
Enhancing Language Model Rationality with Bi-Directional Deliberation Reasoning
Zhang, Yadong, Mao, Shaoguang, Wu, Wenshan, Xia, Yan, Ge, Tao, Lan, Man, Wei, Furu
This paper introduces BI-Directional DEliberation Reasoning (BIDDER), a novel reasoning approach to enhance the decision rationality of language models. Traditional reasoning methods typically rely on historical information and employ uni-directional (left-to-right) reasoning strategy. This lack of bi-directional deliberation reasoning results in limited awareness of potential future outcomes and insufficient integration of historical context, leading to suboptimal decisions. BIDDER addresses this gap by incorporating principles of rational decision-making, specifically managing uncertainty and predicting expected utility. Our approach involves three key processes: Inferring hidden states to represent uncertain information in the decision-making process from historical data; Using these hidden states to predict future potential states and potential outcomes; Integrating historical information (past contexts) and long-term outcomes (future contexts) to inform reasoning. By leveraging bi-directional reasoning, BIDDER ensures thorough exploration of both past and future contexts, leading to more informed and rational decisions. We tested BIDDER's effectiveness in two well-defined scenarios: Poker (Limit Texas Hold'em) and Negotiation. Our experiments demonstrate that BIDDER significantly improves the decision-making capabilities of LLMs and LLM agents.
Knowledge Management in the Companion Cognitive Architecture
Nakos, Constantine, Forbus, Kenneth D.
One of the fundamental aspects of cognitive architectures is their ability to encode and manipulate knowledge. Without a consistent, well-designed, and scalable knowledge management scheme, an architecture will be unable to move past toy problems and tackle the broader problems of cognition. In this paper, we document some of the challenges we have faced in developing the knowledge stack for the Companion cognitive architecture and discuss the tools, representations, and practices we have developed to overcome them. We also lay out a series of potential next steps that will allow Companion agents to play a greater role in managing their own knowledge. It is our hope that these observations will prove useful to other cognitive architecture developers facing similar challenges.
Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems
Dalrymple, David "davidad", Skalse, Joar, Bengio, Yoshua, Russell, Stuart, Tegmark, Max, Seshia, Sanjit, Omohundro, Steve, Szegedy, Christian, Goldhaber, Ben, Ammann, Nora, Abate, Alessandro, Halpern, Joe, Barrett, Clark, Zhao, Ding, Zhi-Xuan, Tan, Wing, Jeannette, Tenenbaum, Joshua
We introduce and define a family of approaches to AI safety, collectively referred to as guaranteed safe (GS) AI. These Ensuring that AI systems reliably and robustly approaches aim to provide high-assurance quantitative guarantees avoid harmful or dangerous behaviours is a crucial about the safety of an AI system's behaviour through challenge, especially for AI systems with a the use of three core components -- a formal safety specification, high degree of autonomy and general intelligence, a world model, and a verifier. We will argue that this or systems used in safety-critical contexts. In strategy is both promising and underexplored, and contrast it this position paper, we will introduce and define with other ongoing efforts in AI safety. We will also outline a family of approaches to AI safety, which we several ongoing avenues of research within the broader GS will refer to as guaranteed safe (GS) AI. The core research agenda, identify some of their core difficulties, and feature of these approaches is that they aim to produce discuss approaches for overcoming these difficulties. Central AI systems which are equipped with highassurance examples of agendas which fall under the GS AI family quantitative safety guarantees. This include Szegedy (2020); Wing (2021); Seshia et al. (2022); is achieved by the interplay of three core components: Russell (2022); Tegmark & Omohundro (2023); 'davidad' a world model (which provides a mathematical Dalrymple (2024); Bengio (2024).
BEVWorld: A Multimodal World Model for Autonomous Driving via Unified BEV Latent Space
Zhang, Yumeng, Gong, Shi, Xiong, Kaixin, Ye, Xiaoqing, Tan, Xiao, Wang, Fan, Huang, Jizhou, Wu, Hua, Wang, Haifeng
World models are receiving increasing attention in autonomous driving for their ability to predict potential future scenarios. In this paper, we present BEVWorld, a novel approach that tokenizes multimodal sensor inputs into a unified and compact Bird's Eye View (BEV) latent space for environment modeling. The world model consists of two parts: the multi-modal tokenizer and the latent BEV sequence diffusion model. The multi-modal tokenizer first encodes multi-modality information and the decoder is able to reconstruct the latent BEV tokens into LiDAR and image observations by ray-casting rendering in a self-supervised manner. Then the latent BEV sequence diffusion model predicts future scenarios given action tokens as conditions. Experiments demonstrate the effectiveness of BEVWorld in autonomous driving tasks, showcasing its capability in generating future scenes and benefiting downstream tasks such as perception and motion prediction. Code will be available at https://github.com/zympsyche/BevWorld.
Graph Reasoning Networks
Zopf, Markus, Alesiani, Francesco
Graph neural networks (GNNs) are the predominant approach for graph-based machine learning. While neural networks have shown great performance at learning useful representations, they are often criticized for their limited high-level reasoning abilities. In this work, we present Graph Reasoning Networks (GRNs), a novel approach to combine the strengths of fixed and learned graph representations and a reasoning module based on a differentiable satisfiability solver. While results on real-world datasets show comparable performance to GNN, experiments on synthetic datasets demonstrate the potential of the newly proposed method.
EventGround: Narrative Reasoning by Grounding to Eventuality-centric Knowledge Graphs
Jiayang, Cheng, Qiu, Lin, Chan, Chunkit, Liu, Xin, Song, Yangqiu, Zhang, Zheng
Narrative reasoning relies on the understanding of eventualities in story contexts, which requires a wealth of background world knowledge. To help machines leverage such knowledge, existing solutions can be categorized into two groups. Some focus on implicitly modeling eventuality knowledge by pretraining language models (LMs) with eventuality-aware objectives. However, this approach breaks down knowledge structures and lacks interpretability. Others explicitly collect world knowledge of eventualities into structured eventuality-centric knowledge graphs (KGs). However, existing research on leveraging these knowledge sources for free-texts is limited. In this work, we propose an initial comprehensive framework called EventGround, which aims to tackle the problem of grounding free-texts to eventuality-centric KGs for contextualized narrative reasoning. We identify two critical problems in this direction: the event representation and sparsity problems. We provide simple yet effective parsing and partial information extraction methods to tackle these problems. Experimental results demonstrate that our approach consistently outperforms baseline models when combined with graph neural network (GNN) or large language model (LLM) based graph reasoning models. Our framework, incorporating grounded knowledge, achieves state-of-the-art performance while providing interpretable evidence.
Imperative Learning: A Self-supervised Neural-Symbolic Learning Framework for Robot Autonomy
Wang, Chen, Ji, Kaiyi, Geng, Junyi, Ren, Zhongqiang, Fu, Taimeng, Yang, Fan, Guo, Yifan, He, Haonan, Chen, Xiangyu, Zhan, Zitong, Du, Qiwei, Su, Shaoshu, Li, Bowen, Qiu, Yuheng, Du, Yi, Li, Qihang, Yang, Yifan, Lin, Xiao, Zhao, Zhipeng
Data-driven methods such as reinforcement and imitation learning have achieved remarkable success in robot autonomy. However, their data-centric nature still hinders them from generalizing well to ever-changing environments. Moreover, collecting large datasets for robotic tasks is often impractical and expensive. To overcome these challenges, we introduce a new self-supervised neural-symbolic (NeSy) computational framework, imperative learning (IL), for robot autonomy, leveraging the generalization abilities of symbolic reasoning. The framework of IL consists of three primary components: a neural module, a reasoning engine, and a memory system. We formulate IL as a special bilevel optimization (BLO), which enables reciprocal learning over the three modules. This overcomes the label-intensive obstacles associated with data-driven approaches and takes advantage of symbolic reasoning concerning logical reasoning, physical principles, geometric analysis, etc. We discuss several optimization techniques for IL and verify their effectiveness in five distinct robot autonomy tasks including path planning, rule induction, optimal control, visual odometry, and multi-robot routing. Through various experiments, we show that IL can significantly enhance robot autonomy capabilities and we anticipate that it will catalyze further research across diverse domains.
RAM: Towards an Ever-Improving Memory System by Learning from Communications
Li, Jiaqi, Wang, Xiaobo, Ding, Wentao, Wang, Zihao, Kang, Yipeng, Jia, Zixia, Zheng, Zilong
More in the human direction until they were recently, retrieval-augmented generation (RAG; present." Lewis et al. (2020)) is proposed to enable accessing and precisely manipulating the memory with a --Tomasello (2010) disentangled knowledge storage system; refer to 4 Human learning, extended as a lifelong process, for details. However, conventional RAG augments typically operates in a communicative and cooperative LLMs with a static and exterior knowledge to address framework among people via different forms knowledge-intensive tasks. Fundamentally, of interactions within the physical and social world, the main challenge of building CLAI agents lies as evidenced by the above quotes of Tomasello in determining when and how to update dynamic (2010). From toddlers to academic graduates, the and internal knowledge given communicative feedback.
Neural Probabilistic Logic Learning for Knowledge Graph Reasoning
Sun, Fengsong, Wang, Jinyu, Wei, Zhiqing, Zhang, Xianchao
Knowledge graph (KG) reasoning is a task that aims to predict unknown facts based on known factual samples. Reasoning methods can be divided into two categories: rule-based methods and KG-embedding based methods. The former possesses precise reasoning capabilities but finds it challenging to reason efficiently over large-scale knowledge graphs. While gaining the ability to reason over large-scale knowledge graphs, the latter sacrifices reasoning accuracy. This paper aims to design a reasoning framework called Neural Probabilistic Logic Learning(NPLL) that achieves accurate reasoning on knowledge graphs. Our approach introduces a scoring module that effectively enhances the expressive power of embedding networks, striking a balance between model simplicity and reasoning capabilities. We improve the interpretability of the model by incorporating a Markov Logic Network based on variational inference. We empirically evaluate our approach on several benchmark datasets, and the experimental results validate that our method substantially enhances the accuracy and quality of the reasoning results.