Problem Solving
Proceedings of the 2024 XCSP3 Competition
Audemard, Gilles, Lecoutre, Christophe, Lonca, Emmanuel
This short paper gives an overview of the XCSP3 solver implemented in Picat. Picat provides several constraint modules, and the Picat XCSP3 solver uses the sat module. The XCSP3 solver mainly consists of a parser implemented in Picat, which converts constraints from XCSP3 format to Picat. The solver demonstrates the strengths of Picat, a logic-based language, in parsing, modeling, and encoding constraints into SAT. The high performance of the solver in recent XCSP competitions demonstrates the viability of using a SAT solver to solve general constraint satisfaction and optimization problems.
Deliberate Reasoning for LLMs as Structure-aware Planning with Accurate World Model
Xiong, Siheng, Payani, Ali, Yang, Yuan, Fekri, Faramarz
Enhancing the reasoning capabilities of large language models (LLMs) remains a key challenge, especially for tasks that require complex, multi-step decision-making. Humans excel at these tasks by leveraging deliberate planning with an internal world model to simulate the potential outcomes of various actions. Inspired by this, we propose a novel multi-step reasoning framework for LLMs, referred to as Structure-aware Planning with Accurate World Model (SWAP). Unlike previous approaches that rely solely on Chain-of-Thought (CoT) reasoning in natural language, SWAP incorporates structural information to guide the reasoning process via a world model and provides a soft verification mechanism over the steps. Moreover, SWAP overcomes the challenge of accurate world state predictions in complex reasoning tasks by introducing a Generator-Discriminator architecture, which enables more reliable world modeling. Specifically, the generator predicts the next state, and the discriminator ensures alignment with the logical consistency required by the problem context. SWAP also encourages the policy model to explore a broad range of potential actions to prevent premature convergence. By resolving the bottlenecks of generation diversity for both actions and states using diversity-based modeling (DBM) and improving discrimination accuracy through contrastive ranking (CR), SWAP significantly enhances the reasoning performance of LLMs. We evaluate SWAP across diverse reasoning-intensive benchmarks including math reasoning, logical reasoning, and coding tasks. Extensive experiments demonstrate that SWAP achieves substantial improvements over the baselines and consistently outperforms existing methods.
NeuroAI for AI Safety
Mineault, Patrick, Zanichelli, Niccolò, Peng, Joanne Zichen, Arkhipov, Anton, Bingham, Eli, Jara-Ettinger, Julian, Mackevicius, Emily, Marblestone, Adam, Mattar, Marcelo, Payne, Andrew, Sanborn, Sophia, Schroeder, Karen, Tavares, Zenna, Tolias, Andreas
As AI systems become increasingly powerful, the need for safe AI has become more pressing. Humans are an attractive model for AI safety: as the only known agents capable of general intelligence, they perform robustly even under conditions that deviate significantly from prior experiences, explore the world safely, understand pragmatics, and can cooperate to meet their intrinsic goals. Intelligence, when coupled with cooperation and safety mechanisms, can drive sustained progress and well-being. These properties are a function of the architecture of the brain and the learning algorithms it implements. Neuroscience may thus hold important keys to technical AI safety that are currently underexplored and underutilized. In this roadmap, we highlight and critically evaluate several paths toward AI safety inspired by neuroscience: emulating the brain's representations, information processing, and architecture; building robust sensory and motor systems from imitating brain data and bodies; fine-tuning AI systems on brain data; advancing interpretability using neuroscience methods; and scaling up cognitively-inspired architectures. We make several concrete recommendations for how neuroscience can positively impact AI safety.
Abductive Symbolic Solver on Abstraction and Reasoning Corpus
Lim, Mintaek, Lee, Seokki, Abitew, Liyew Woletemaryam, Kim, Sundong
This paper addresses the challenge of enhancing artificial intelligence reasoning capabilities, focusing on logicality within the Abstraction and Reasoning Corpus (ARC). Humans solve such visual reasoning tasks based on their observations and hypotheses, and they can explain their solutions with a proper reason. However, many previous approaches focused only on the grid transition and it is not enough for AI to provide reasonable and human-like solutions. By considering the human process of solving visual reasoning tasks, we have concluded that the thinking process is likely the abductive reasoning process. Thus, we propose a novel framework that symbolically represents the observed data into a knowledge graph and extracts core knowledge that can be used for solution generation. This information limits the solution search space and helps provide a reasonable mid-process. Our approach holds promise for improving AI performance on ARC tasks by effectively narrowing the solution space and providing logical solutions grounded in core knowledge extraction.
Online Knowledge Integration for 3D Semantic Mapping: A Survey
Igelbrink, Felix, Renz, Marian, Günther, Martin, Powell, Piper, Niecksch, Lennart, Lima, Oscar, Atzmueller, Martin, Hertzberg, Joachim
Semantic mapping is a key component of robots operating in and interacting with objects in structured environments. Traditionally, geometric and knowledge representations within a semantic map have only been loosely integrated. However, recent advances in deep learning now allow full integration of prior knowledge, represented as knowledge graphs or language concepts, into sensor data processing and semantic mapping pipelines. Semantic scene graphs and language models enable modern semantic mapping approaches to incorporate graph-based prior knowledge or to leverage the rich information in human language both during and after the mapping process. This has sparked substantial advances in semantic mapping, leading to previously impossible novel applications. This survey reviews these recent developments comprehensively, with a focus on online integration of knowledge into semantic mapping. We specifically focus on methods using semantic scene graphs for integrating symbolic prior knowledge and language models for respective capture of implicit common-sense knowledge and natural language concepts
Multi-Source Knowledge Pruning for Retrieval-Augmented Generation: A Benchmark and Empirical Study
Yu, Shuo, Cheng, Mingyue, Yang, Jiqian, Ouyang, Jie, Luo, Yucong, Lei, Chenyi, Liu, Qi, Chen, Enhong
Retrieval-augmented generation (RAG) is increasingly recognized as an effective approach for mitigating the hallucination of large language models (LLMs) through the integration of external knowledge. While numerous efforts, most studies focus on a single type of externeal knowledge source. However, in real-world applications, most situations involve diverse knowledge from various sources, yet this area has been less explored. The main dilemma is the lack of a suitable dataset containing multiple knowledge sources and pre-exploration of the associated issues. To address these challenges, we standardize a benchmark dataset that combines structured and unstructured knowledge across diverse and complementary domains. Based on this dataset, we further develop a plug-and-play RAG framework, PruningRAG, whose main characteristic is to employ multi-granularity pruning strategies for optimizing the integration of relevant information and minimizing misleading context. Building upon the standardized dataset and PruningRAG, we also report a series of experimental results, as well as insightful findings. Our dataset and code are publicly available\footnote{https://github.com/USTCAGI/PruningRAG}, with the aim of advancing future research in the RAG community.
Natural Language Understanding and Inference with MLLM in Visual Question Answering: A Survey
Kuang, Jiayi, Xie, Jingyou, Luo, Haohao, Li, Ronghao, Xu, Zhe, Cheng, Xianfeng, Li, Yinghui, Lin, Xika, Shen, Ying
Visual Question Answering (VQA) is a challenge task that combines natural language processing and computer vision techniques and gradually becomes a benchmark test task in multimodal large language models (MLLMs). The goal of our survey is to provide an overview of the development of VQA and a detailed description of the latest models with high timeliness. This survey gives an up-to-date synthesis of natural language understanding of images and text, as well as the knowledge reasoning module based on image-question information on the core VQA tasks. In addition, we elaborate on recent advances in extracting and fusing modal information with vision-language pretraining models and multimodal large language models in VQA. We also exhaustively review the progress of knowledge reasoning in VQA by detailing the extraction of internal knowledge and the introduction of external knowledge. Finally, we present the datasets of VQA and different evaluation metrics and discuss possible directions for future work.
Storing overlapping associative memories on latent manifolds in low-rank spiking networks
Podlaski, William F., Machens, Christian K.
Associative memory architectures such as the Hopfield network have long been important conceptual and theoretical models for neuroscience and artificial intelligence. However, translating these abstract models into spiking neural networks has been surprisingly difficult. Indeed, much previous work has been restricted to storing a small number of primarily non-overlapping memories in large networks, thereby limiting their scalability. Here, we revisit the associative memory problem in light of recent advances in understanding spike-based computation. Using a recently-established geometric framework, we show that the spiking activity for a large class of all-inhibitory networks is situated on a low-dimensional, convex, and piecewise-linear manifold, with dynamics that move along the manifold. We then map the associative memory problem onto these dynamics, and demonstrate how the vertices of a hypercubic manifold can be used to store stable, overlapping activity patterns with a direct correspondence to the original Hopfield model. We propose several learning rules, and demonstrate a linear scaling of the storage capacity with the number of neurons, as well as robust pattern completion abilities. Overall, this work serves as a case study to demonstrate the effectiveness of using a geometrical perspective to design dynamics on neural manifolds, with implications for neuroscience and machine learning.
Object-centric proto-symbolic behavioural reasoning from pixels
van Bergen, Ruben, Hübotter, Justus, Lanillos, Pablo
Autonomous intelligent agents must bridge computational challenges at disparate levels of abstraction, from the low-level spaces of sensory input and motor commands to the high-level domain of abstract reasoning and planning. A key question in designing such agents is how best to instantiate the representational space that will interface between these two levels -- ideally without requiring supervision in the form of expensive data annotations. These objectives can be efficiently achieved by representing the world in terms of objects (grounded in perception and action). In this work, we present a novel, brain-inspired, deep-learning architecture that learns from pixels to interpret, control, and reason about its environment, using object-centric representations. We show the utility of our approach through tasks in synthetic environments that require a combination of (high-level) logical reasoning and (low-level) continuous control. Results show that the agent can learn emergent conditional behavioural reasoning, such as $(A \to B) \land (\neg A \to C)$, as well as logical composition $(A \to B) \land (A \to C) \vdash A \to (B \land C)$ and XOR operations, and successfully controls its environment to satisfy objectives deduced from these logical rules. The agent can adapt online to unexpected changes in its environment and is robust to mild violations of its world model, thanks to dynamic internal desired goal generation. While the present results are limited to synthetic settings (2D and 3D activated versions of dSprites), which fall short of real-world levels of complexity, the proposed architecture shows how to manipulate grounded object representations, as a key inductive bias for unsupervised learning, to enable behavioral reasoning.
Teaching Smaller Language Models To Generalise To Unseen Compositional Questions (Full Thesis)
Pretrained large Language Models (LLMs) are able to answer questions that are unlikely to have been encountered during training. However a diversity of potential applications exist in the broad domain of reasoning systems and considerations such as latency, cost, available compute resource and internet connectivity are relevant in determining an appropriate approach. We consider the setting where some local compute capacity is available at inference time but internet connectivity is not. Similar to a general-purpose LLM, we assume that our much smaller Reasoning Models may be asked arbitrary questions from unknown distributions, so we focus on evaluation in an unseen setting. We train our models to answer diverse questions by instilling an ability to reason over a retrieved context. We acquire context from two knowledge sources; a Wikipedia corpus queried using a multi-hop dense retrieval system with novel extensions, and from rationales generated from a larger Language Model optimised to run in a lower resource environment. Our main contributions: We propose novel methods to show that our model is capable of answering contextualised questions without memorisation. We establish a comprehensive set of baseline results on unseen evaluation datasets. We show that the addition of novel retrieval-augmented training datasets (RATD) to the training regime of the Reasoning Model significantly improves results. We demonstrate further significant improvement through the application of methods for combining knowledge from two sources. The first method (RR) involves training a novel Rationale Ranking model to score both generated rationales and retrieved contexts with respect to relevance and truthfulness. We use the scores to derive combined contexts. We also show that utilising the RATD datasets enables our model to become proficient at utilising combined noisy contexts.