Problem Solving
Generating by Understanding: Neural Visual Generation with Logical Symbol Groundings
Peng, Yifei, Jin, Yu, Luo, Zhexu, Ding, Yao-Xiang, Dai, Wang-Zhou, Ren, Zhong, Zhou, Kun
Despite the great success of neural visual generative models in recent years, integrating them with strong symbolic reasoning systems remains a challenging task. There are two levels of symbol grounding problems among the core challenges: the first is symbol assignment, i.e. mapping latent factors of neural visual generators to semantic-meaningful symbolic factors from the reasoning systems by learning from limited labeled data. The second is rule learning, i.e. learning new rules that govern the generative process to enhance the symbolic reasoning systems. To deal with these two problems, we propose a neurosymbolic learning approach, Abductive visual Generation (AbdGen), for integrating logic programming systems with neural visual generative models based on the abductive learning framework. To achieve reliable and efficient symbol grounding, the quantized abduction method is introduced for generating abduction proposals by the nearest-neighbor lookup within semantic codebooks. To achieve precise rule learning, the contrastive meta-abduction method is proposed to eliminate wrong rules with positive cases and avoid less informative rules with negative cases simultaneously. Experimental results show that compared to the baseline approaches, AbdGen requires significantly less labeled data for symbol assignment. Furthermore, AbdGen can effectively learn underlying logical generative rules from data, which is out of the capability of existing approaches.
Foundation Model Sherpas: Guiding Foundation Models through Knowledge and Reasoning
Bhattacharjya, Debarun, Lee, Junkyu, Agravante, Don Joven, Ganesan, Balaji, Marinescu, Radu
Foundation models (FMs) such as large language models have revolutionized the field of AI by showing remarkable performance in various tasks. However, they exhibit numerous limitations that prevent their broader adoption in many real-world systems, which often require a higher bar for trustworthiness and usability. Since FMs are trained using loss functions aimed at reconstructing the training corpus in a self-supervised manner, there is no guarantee that the model's output aligns with users' preferences for a specific task at hand. In this survey paper, we propose a conceptual framework that encapsulates different modes by which agents could interact with FMs and guide them suitably for a set of tasks, particularly through knowledge augmentation and reasoning. Our framework elucidates agent role categories such as updating the underlying FM, assisting with prompting the FM, and evaluating the FM output. We also categorize several state-of-the-art approaches into agent interaction protocols, highlighting the nature and extent of involvement of the various agent roles. The proposed framework provides guidance for future directions to further realize the power of FMs in practical AI systems.
A Chain-of-Thought Is as Strong as Its Weakest Link: A Benchmark for Verifiers of Reasoning Chains
Jacovi, Alon, Bitton, Yonatan, Bohnet, Bernd, Herzig, Jonathan, Honovich, Or, Tseng, Michael, Collins, Michael, Aharoni, Roee, Geva, Mor
Prompting language models to provide step-by-step answers (e.g., "Chain-of-Thought") is the prominent approach for complex reasoning tasks, where more accurate reasoning chains typically improve downstream task performance. Recent literature discusses automatic methods to verify reasoning steps to evaluate and improve their correctness. However, no fine-grained step-level datasets are available to enable thorough evaluation of such verification methods, hindering progress in this direction. We introduce Reveal: Reasoning Verification Evaluation, a new dataset to benchmark automatic verifiers of complex Chain-of-Thought reasoning in open-domain question answering settings. Reveal includes comprehensive labels for the relevance, attribution to evidence passages, and logical correctness of each reasoning step in a language model's answer, across a wide variety of datasets and state-of-the-art language models.
Learning Planning-based Reasoning by Trajectories Collection and Process Reward Synthesizing
Jiao, Fangkai, Qin, Chengwei, Liu, Zhengyuan, Chen, Nancy F., Joty, Shafiq
Large Language Models (LLMs) have demonstrated significant potential in handling complex reasoning tasks through step-by-step rationale generation. However, recent studies have raised concerns regarding the hallucination and flaws in their reasoning process. Substantial efforts are being made to improve the reliability and faithfulness of the generated rationales. Some approaches model reasoning as planning, while others focus on annotating for process supervision. Nevertheless, the planning-based search process often results in high latency due to the frequent assessment of intermediate reasoning states and the extensive exploration space. Additionally, supervising the reasoning process with human annotation is costly and challenging to scale for LLM training. To address these issues, in this paper, we propose a framework to learn planning-based reasoning through direct preference optimization (DPO) on collected trajectories, which are ranked according to synthesized process rewards. Our results on challenging logical reasoning benchmarks demonstrate the effectiveness of our learning framework, showing that our 7B model can surpass the strong counterparts like GPT-3.5-Turbo.
Distilling Mathematical Reasoning Capabilities into Small Language Models
Zhu, Xunyu, Li, Jian, Liu, Yong, Ma, Can, Wang, Weiping
This work addresses the challenge of democratizing advanced Large Language Models (LLMs) by compressing their mathematical reasoning capabilities into sub-billion parameter Small Language Models (SLMs) without compromising performance. We introduce Equation-of-Thought Distillation (EoTD), a novel technique that encapsulates the reasoning process into equation-based representations to construct an EoTD dataset for fine-tuning SLMs. Additionally, we propose the Ensemble Thoughts Distillation (ETD) framework to enhance the reasoning performance of SLMs. This involves creating a reasoning dataset with multiple thought processes, including Chain-of-Thought (CoT), Program-of-Thought (PoT), and Equation-of-Thought (EoT), and using it for fine-tuning. Our experimental findings demonstrate that EoTD significantly boosts the reasoning abilities of SLMs, while ETD enables these models to achieve state-of-the-art reasoning performance.
Information-Theoretic Thresholds for Planted Dense Cycles
Mao, Cheng, Wein, Alexander S., Zhang, Shenduo
The Watts-Strogatz small-world model has been an influential random graph model since its proposal in 1998 due to the ubiquity of the small-world phenomenon in complex networks [WS98, Wat04]. In this model, there are n vertices with latent positions on a circle, and the vertices are more likely to be connected to their k-nearest geometric neighbors than to more distant vertices. In other words, a denser cycle of length n and width k is "planted" in the sparser ambient random graph on n vertices. Informally, the small-world model can also be viewed as an interpolation between a random geometric graph [Pen03], where edges exist only between vertices with nearby locations on a circle, and an Erdős-Rényi graph [ER59], where edges are random and independent. As a consequence, a small-world network tends to have a high clustering coefficient due to the geometry while preserving low distances between vertices in a random graph. While there has been extensive literature on small-world networks and geometric graphs, the associated statistical problems, such as detection and recovery of the latent geometry from the observed random graph, have only gained attention more recently.
Predicting the Future with Simple World Models
Saanum, Tankred, Dayan, Peter, Schulz, Eric
World models can represent potentially high-dimensional pixel observations in compact latent spaces, making it tractable to model the dynamics of the environment. However, the latent dynamics inferred by these models may still be highly complex. Abstracting the dynamics of the environment with simple models can have several benefits. If the latent dynamics are simple, the model may generalize better to novel transitions, and discover useful latent representations of environment states. We propose a regularization scheme that simplifies the world model's latent dynamics. Our model, the Parsimonious Latent Space Model (PLSM), minimizes the mutual information between latent states and the dynamics that arise between them. This makes the dynamics softly state-invariant, and the effects of the agent's actions more predictable. We combine the PLSM with three different model classes used for i) future latent state prediction, ii) video prediction, and iii) planning. We find that our regularization improves accuracy, generalization, and performance in downstream tasks.
PF-GNN: Differentiable particle filtering based approximation of universal graph representations
Dupty, Mohammed Haroon, Dong, Yanfei, Lee, Wee Sun
Message passing Graph Neural Networks (GNNs) are known to be limited in expressive power by the 1-WL color-refinement test for graph isomorphism. Other more expressive models either are computationally expensive or need preprocessing to extract structural features from the graph. In this work, we propose to make GNNs universal by guiding the learning process with exact isomorphism solver techniques which operate on the paradigm of Individualization and Refinement (IR), a method to artificially introduce asymmetry and further refine the coloring when 1-WL stops. Isomorphism solvers generate a search tree of colorings whose leaves uniquely identify the graph. However, the tree grows exponentially large and needs hand-crafted pruning techniques which are not desirable from a learning perspective. We take a probabilistic view and approximate the search tree of colorings (i.e. embeddings) by sampling multiple paths from root to leaves of the search tree. To learn more discriminative representations, we guide the sampling process with particle filter updates, a principled approach for sequential state estimation. Our algorithm is end-to-end differentiable, can be applied with any GNN as backbone and learns richer graph representations with only linear increase in runtime. Experimental evaluation shows that our approach consistently outperforms leading GNN models on both synthetic benchmarks for isomorphism detection as well as real-world datasets.
Does mapping elites illuminate search spaces? A large-scale user study of MAP--Elites applied to human--AI collaborative design
Walton, Sean P., Evans, Ben J., Rahat, Alma A. M., Stovold, James, Vincalek, Jakub
Two studies of a human-AI collaborative design tool were carried out in order to understand the influence design recommendations have on the design process. The tool investigated is based on an evolutionary algorithm attempting to design a virtual car to travel as far as possible in a fixed time. Participants were able to design their own cars, make recommendations to the algorithm and view sets of recommendations from the algorithm. The algorithm-recommended sets were designs which had been previously tested; some sets were simply randomly picked and other sets were picked using MAP-Elites. In the first study 808 design sessions were recorded as part of a science outreach program, each with analytical data of how each participant used the tool. To provide context to this quantitative data, a smaller double-blind lab study was also carried out with 12 participants. In the lab study the same quantitative data from the large scale study was collected alongside responses to interview questions. Although there is some evidence that the MAP-Elites provide higher-quality individual recommendations, neither study provides convincing evidence that these recommendations have a more positive influence on the design process than simply a random selection of designs. In fact, it seems that providing a combination of MAP-Elites and randomly selected recommendations is beneficial to the process. Furthermore, simply viewing recommendations from the MAP-Elites had a positive influence on engagement in the design task and the quality of the final design produced. Our findings are significant both for researchers designing new mixed-initiative tools, and those who wish to evaluate existing tools. Most significantly, we found that metrics researchers currently use to evaluate the success of human-AI collaborative algorithms do not measure the full influence these algorithms have on the design process.
Propagation and Pitfalls: Reasoning-based Assessment of Knowledge Editing through Counterfactual Tasks
Hua, Wenyue, Guo, Jiang, Dong, Mingwen, Zhu, Henghui, Ng, Patrick, Wang, Zhiguo
Current approaches of knowledge editing struggle to effectively propagate updates to interconnected facts. In this work, we delve into the barriers that hinder the appropriate propagation of updated knowledge within these models for accurate reasoning. To support our analysis, we introduce a novel reasoning-based benchmark -- ReCoE (Reasoning-based Counterfactual Editing dataset) -- which covers six common reasoning schemes in real world. We conduct a thorough analysis of existing knowledge editing techniques, including input augmentation, finetuning, and locate-and-edit. We found that all model editing methods show notably low performance on this dataset, especially in certain reasoning schemes. Our analysis over the chain-of-thought generation of edited models further uncover key reasons behind the inadequacy of existing knowledge editing methods from a reasoning standpoint, involving aspects on fact-wise editing, fact recall ability, and coherence in generation. We will make our benchmark publicly available.