Logic & Formal Reasoning
Asynchronous Agents with Perfect Recall: Model Reductions, Knowledge-Based Construction, and Model Checking for Coalitional Strategies
Gurov, Dilian, Jamroga, Filip, Jamroga, Wojciech, Kamiński, Mateusz, Kurpiewski, Damian, Penczek, Wojciech, Sidoruk, Teofil
Model checking of strategic abilities for agents with memory is a notoriously hard problem, and very few attempts have been made to tackle it. In this paper, we present two important steps towards this goal. First, we take the partial-order reduction scheme that was recently proved to preserve individual and coalitional abilities of memoryless agents, and show that it also works for agents with memory. Secondly, we take the Knowledge-Based Subset Construction, that was recently studied for synchronous concurrent games, and adapt it to preserve abilities of memoryful agents in asynchronous MAS. On the way, we also propose a new execution semantics for strategies in asynchronous MAS, that combines elements of Concurrent Game Structures and Interleaved Interpreted Systems in a natural and intuitive way.
A Compositional Atlas for Algebraic Circuits
Wang, Benjie, Mauá, Denis Deratani, Broeck, Guy Van den, Choi, YooJung
Circuits based on sum-product structure have become a ubiquitous representation to compactly encode knowledge, from Boolean functions to probability distributions. By imposing constraints on the structure of such circuits, certain inference queries become tractable, such as model counting and most probable configuration. Recent works have explored analyzing probabilistic and causal inference queries as compositions of basic operators to derive tractability conditions. In this paper, we take an algebraic perspective for compositional inference, and show that a large class of queries - including marginal MAP, probabilistic answer set programming inference, and causal backdoor adjustment - correspond to a combination of basic operators over semirings: aggregation, product, and elementwise mapping. Using this framework, we uncover simple and general sufficient conditions for tractable composition of these operators, in terms of circuit properties (e.g., marginal determinism, compatibility) and conditions on the elementwise mappings. Applying our analysis, we derive novel tractability conditions for many such compositional queries. Our results unify tractability conditions for existing problems on circuits, while providing a blueprint for analysing novel compositional inference queries.
AI Planning: A Primer and Survey (Preliminary Report)
Chen, Dillon Z., Verma, Pulkit, Srivastava, Siddharth, Katz, Michael, Thiébaux, Sylvie
Automated decision-making is a fundamental topic that spans multiple sub-disciplines in AI: reinforcement learning (RL), AI planning (AP), foundation models, and operations research, among others. Despite recent efforts to ``bridge the gaps'' between these communities, there remain many insights that have not yet transcended the boundaries. Our goal in this paper is to provide a brief and non-exhaustive primer on ideas well-known in AP, but less so in other sub-disciplines. We do so by introducing the classical AP problem and representation, and extensions that handle uncertainty and time through the Markov Decision Process formalism. Next, we survey state-of-the-art techniques and ideas for solving AP problems, focusing on their ability to exploit problem structure. Lastly, we cover subfields within AP for learning structure from unstructured inputs and learning to generalise to unseen scenarios and situations.
Autoformalize Mathematical Statements by Symbolic Equivalence and Semantic Consistency
Li, Zenan, Wu, Yifan, Li, Zhaoyu, Wei, Xinming, Zhang, Xian, Yang, Fan, Ma, Xiaoxing
Autoformalization, the task of automatically translating natural language descriptions into a formal language, poses a significant challenge across various domains, especially in mathematics. Recent advancements in large language models (LLMs) have unveiled their promising capabilities to formalize even competition-level math problems. However, we observe a considerable discrepancy between pass@1 and pass@k accuracies in LLM-generated formalizations. To address this gap, we introduce a novel framework that scores and selects the best result from k autoformalization candidates based on two complementary self-consistency methods: symbolic equivalence and semantic consistency. Elaborately, symbolic equivalence identifies the logical homogeneity among autoformalization candidates using automated theorem provers, and semantic consistency evaluates the preservation of the original meaning by informalizing the candidates and computing the similarity between the embeddings of the original and informalized texts. Our extensive experiments on the MATH and miniF2F datasets demonstrate that our approach significantly enhances autoformalization accuracy, achieving up to 0.22-1.35x
Grounded Language Design for Lightweight Diagramming for Formal Methods
Prasad, Siddhartha, Greenman, Ben, Nelson, Tim, Krishnamurthi, Shriram
Model finding, as embodied by SAT solvers and similar tools, is used widely, both in embedding settings and as a tool in its own right. For instance, tools like Alloy target SAT to enable users to incrementally define, explore, verify, and diagnose sophisticated specifications for a large number of complex systems. These tools critically include a visualizer that lets users graphically explore these generated models. As we show, however, default visualizers, which know nothing about the domain, are unhelpful and even actively violate presentational and cognitive principles. At the other extreme, full-blown visualizations require significant effort as well as knowledge a specifier might not possess; they can also exhibit bad failure modes (including silent failure). Instead, we need a language to capture essential domain information for lightweight diagramming. We ground our language design in both the cognitive science literature on diagrams and on a large number of example custom visualizations. This identifies the key elements of lightweight diagrams. We distill these into a small set of orthogonal primitives. We extend an Alloy-like tool to support these primitives. We evaluate the effectiveness of the produced diagrams, finding them good for reasoning. We then compare this against many other drawing languages and tools to show that this work defines a new niche that is lightweight, effective, and driven by sound principles.
Neuro-Symbolic Evaluation of Text-to-Video Models using Formal Verification
Sharan, S. P., Choi, Minkyu, Shah, Sahil, Goel, Harsh, Omama, Mohammad, Chinchali, Sandeep
Recent advancements in text-to-video models such as Sora, Gen-3, MovieGen, and CogVideoX are pushing the boundaries of synthetic video generation, with adoption seen in fields like robotics, autonomous driving, and entertainment. As these models become prevalent, various metrics and benchmarks have emerged to evaluate the quality of the generated videos. However, these metrics emphasize visual quality and smoothness, neglecting temporal fidelity and text-to-video alignment, which are crucial for safety-critical applications. To address this gap, we introduce NeuS-V, a novel synthetic video evaluation metric that rigorously assesses text-to-video alignment using neuro-symbolic formal verification techniques. Our approach first converts the prompt into a formally defined Temporal Logic (TL) specification and translates the generated video into an automaton representation. Then, it evaluates the text-to-video alignment by formally checking the video automaton against the TL specification. Furthermore, we present a dataset of temporally extended prompts to evaluate state-of-the-art video generation models against our benchmark. We find that NeuS-V demonstrates a higher correlation by over 5x with human evaluations when compared to existing metrics. Our evaluation further reveals that current video generation models perform poorly on these temporally complex prompts, highlighting the need for future work in improving text-to-video generation capabilities.
Graph Learning for Planning: The Story Thus Far and Open Challenges
Chen, Dillon Z., Hao, Mingyu, Thiébaux, Sylvie, Trevizan, Felipe
Graph learning is naturally well suited for use in planning due to its ability to exploit relational structures exhibited in planning domains and to take as input planning instances with arbitrary number of objects. In this paper, we study the usage of graph learning for planning thus far by studying the theoretical and empirical effects on learning and planning performance of (1) graph representations of planning tasks, (2) graph learning architectures, and (3) optimisation formulations for learning. Our studies accumulate in the GOOSE framework which learns domain knowledge from small planning tasks in order to scale up to much larger planning tasks. In this paper, we also highlight and propose the 5 open challenges in the general Learning for Planning field that we believe need to be addressed for advancing the state-of-the-art.
HT-HEDL: High-Throughput Hypothesis Evaluation in Description Logic
We present High-Throughput Hypothesis Evaluation in Description Logic (HT-HEDL). HT-HEDL is a high-performance hypothesis evaluation engine that accelerates hypothesis evaluation computations for inductive logic programming (ILP) learners using description logic (DL) for their knowledge representation; in particular, HT-HEDL targets accelerating computations for the $\mathcal{ALCQI}^{\mathcal{(D)}}$ DL language. HT-HEDL aggregates the computing power of multi-core CPUs with multi-GPUs to improve hypothesis computations at two levels: 1) the evaluation of a single hypothesis and 2) the evaluation of multiple hypotheses (i.e., batch of hypotheses). In the first level, HT-HEDL uses a single GPU or a vectorized multi-threaded CPU to evaluate a single hypothesis. In vectorized multi-threaded CPU evaluation, classical (scalar) CPU multi-threading is combined with CPU's extended vector instructions set to extract more CPU-based performance. The experimental results revealed that HT-HEDL increased performance using CPU-based evaluation (on a single hypothesis): from 20.4 folds using classical multi-threading to $\sim85$ folds using vectorized multi-threading. In the GPU-based evaluation, HT-HEDL achieved speedups of up to $\sim38$ folds for single hypothesis evaluation using a single GPU. To accelerate the evaluation of multiple hypotheses, HT-HEDL combines, in parallel, GPUs with multi-core CPUs to increase evaluation throughput (number of evaluated hypotheses per second). The experimental results revealed that HT-HEDL increased evaluation throughput by up to 29.3 folds using two GPUs and up to $\sim44$ folds using two GPUs combined with a CPU's vectorized multi-threaded evaluation.
SPILDL: A Scalable and Parallel Inductive Learner in Description Logic
We present SPILDL, a Scalable and Parallel Inductive Learner in Description Logic (DL). SPILDL is based on the DL-Learner (the state of the art in DL-based ILP learning). As a DL-based ILP learner, SPILDL targets the $\mathcal{ALCQI}^{\mathcal{(D)}}$ DL language, and can learn DL hypotheses expressed as disjunctions of conjunctions (using the $\sqcup$ operator). Moreover, SPILDL's hypothesis language also incorporates the use of string concrete roles (also known as string data properties in the Web Ontology Language, OWL); As a result, this incorporation of powerful DL constructs, enables SPILDL to learn powerful DL-based hypotheses for describing many real-world complex concepts. SPILDL employs a hybrid parallel approach which combines both shared-memory and distributed-memory approaches, to accelerates ILP learning (for both hypothesis search and evaluation). According to experimental results, SPILDL's parallel search improved performance by up to $\sim$27.3 folds (best case). For hypothesis evaluation, SPILDL improved evaluation performance through HT-HEDL (our multi-core CPU + multi-GPU hypothesis evaluation engine), by up to 38 folds (best case). By combining both parallel search and evaluation, SPILDL improved performance by up to $\sim$560 folds (best case). In terms of worst case scenario, SPILDL's parallel search doesn't provide consistent speedups on all datasets, and is highly dependent on the search space nature of the ILP dataset. For some datasets, increasing the number of parallel search threads result in reduced performance, similar or worse than baseline. Some ILP datasets benefit from parallel search, while others don't (or the performance gains are negligible). In terms of parallel evaluation, on small datasets, parallel evaluation provide similar or worse performance than baseline.
Finding Structure in Language Models
When we speak, write or listen, we continuously make predictions based on our knowledge of a language's grammar. Remarkably, children acquire this grammatical knowledge within just a few years, enabling them to understand and generalise to novel constructions that have never been uttered before. Language models are powerful tools that create representations of language by incrementally predicting the next word in a sentence, and they have had a tremendous societal impact in recent years. The central research question of this thesis is whether these models possess a deep understanding of grammatical structure similar to that of humans. This question lies at the intersection of natural language processing, linguistics, and interpretability. To address it, we will develop novel interpretability techniques that enhance our understanding of the complex nature of large-scale language models. We approach our research question from three directions. First, we explore the presence of abstract linguistic information through structural priming, a key paradigm in psycholinguistics for uncovering grammatical structure in human language processing. Next, we examine various linguistic phenomena, such as adjective order and negative polarity items, and connect a model's comprehension of these phenomena to the data distribution on which it was trained. Finally, we introduce a controlled testbed for studying hierarchical structure in language models using various synthetic languages of increasing complexity and examine the role of feature interactions in modelling this structure. Our findings offer a detailed account of the grammatical knowledge embedded in language model representations and provide several directions for investigating fundamental linguistic questions using computational methods.