rule application
Think Visually, Reason Textually: Vision-Language Synergy in ARC
Zhang, Beichen, Zang, Yuhang, Dong, Xiaoyi, Cao, Yuhang, Duan, Haodong, Lin, Dahua, Wang, Jiaqi
reasoning from minimal examples remains a core unsolved problem for frontier foundation models such as GPT-5 and Grok 4. These models still fail to infer structured transformation rules from a handful of examples, which is a key hallmark of human intelligence. The Abstraction and Reasoning Corpus for Artificial General Intelligence (ARC-AGI) provides a rigorous testbed for this capability, demanding conceptual rule induction and transfer to novel tasks. Most existing methods treat ARC-AGI as a purely textual reasoning task, overlooking the fact that humans rely heavily on visual abstraction when solving such puzzles. However, our pilot experiments reveal a paradox: naively rendering ARC-AGI grids as images degrades performance due to imprecise rule execution. This leads to our central hypothesis that vision and language possess complementary strengths across distinct reasoning stages: vision supports global pattern abstraction and verification, whereas language specializes in symbolic rule formulation and precise execution. Building on this insight, we introduce two synergistic strategies: (1) Vision-Language Synergy Reasoning (VLSR), which decomposes ARC-AGI into modality-aligned subtasks; and (2) Modality-Switch Self-Correction (MSSC), which leverages vision to verify text-based reasoning for intrinsic error correction. Extensive experiments demonstrate that our approach yields up to a 4.33% improvement over text-only baselines across diverse flagship models and multiple ARC-AGI tasks. Our findings suggest that unifying visual abstraction with linguistic reasoning is a crucial step toward achieving general-izable, human-like intelligence in future foundation models. Source code is released at https://github.com/
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Explainable Rule Application via Structured Prompting: A Neural-Symbolic Approach
Sadowski, Albert, Chudziak, Jarosław A.
Large Language Models (LLMs) excel in complex reasoning tasks but struggle with consistent rule application, exception handling, and explainability, particularly in domains like legal analysis that require both natural language understanding and precise logical inference. This paper introduces a structured prompting framework that decomposes reasoning into three verifiable steps: entity identification, property extraction, and symbolic rule application. By integrating neural and symbolic approaches, our method leverages LLMs' interpretive flexibility while ensuring logical consistency through formal verification. The framework externalizes task definitions, enabling domain experts to refine logical structures without altering the architecture. Evaluated on the LegalBench hearsay determination task, our approach significantly outperformed baselines, with OpenAI o-family models showing substantial improvements - o1 achieving an F1 score of 0.929 and o3-mini reaching 0.867 using structured decomposition with complementary predicates, compared to their few-shot baselines of 0.714 and 0.74 respectively. This hybrid neural-symbolic system offers a promising pathway for transparent and consistent rule-based reasoning, suggesting potential for explainable AI applications in structured legal reasoning tasks.
- North America > Canada > Quebec > Montreal (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
- Information Technology > Artificial Intelligence > Natural Language (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.69)
- Information Technology > Artificial Intelligence > Robots (0.68)
Read it in Two Steps: Translating Extremely Low-Resource Languages with Code-Augmented Grammar Books
Zhang, Chen, Lin, Jiuheng, Liu, Xiao, Zhang, Zekai, Feng, Yansong
While large language models (LLMs) have shown promise in translating extremely low-resource languages using resources like dictionaries, the effectiveness of grammar books remains debated. This paper investigates the role of grammar books in translating extremely low-resource languages by decomposing it into two key steps: grammar rule retrieval and application. To facilitate the study, we introduce ZhuangRules, a modularized dataset of grammar rules and their corresponding test sentences. Our analysis reveals that rule retrieval constitutes a primary bottleneck in grammar-based translation. Moreover, although LLMs can apply simple rules for translation when explicitly provided, they encounter difficulties in handling more complex rules. To address these challenges, we propose representing grammar rules as code functions, considering their similarities in structure and the benefit of code in facilitating LLM reasoning. Our experiments show that using code rules significantly boosts both rule retrieval and application, ultimately resulting in a 13.1% BLEU improvement in translation.
- North America > United States > Florida > Miami-Dade County > Miami (0.05)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- (11 more...)
Evidence of conceptual mastery in the application of rules by Large Language Models
Nunes, José Luiz, Almeida, Guilherme FCF, Flanagan, Brian
In this paper we leverage psychological methods to investigate LLMs' conceptual mastery in applying rules. We introduce a novel procedure to match the diversity of thought generated by LLMs to that observed in a human sample. We then conducted two experiments comparing rule-based decision-making in humans and LLMs. Study 1 found that all investigated LLMs replicated human patterns regardless of whether they are prompted with scenarios created before or after their training cut-off. Moreover, we found unanticipated differences between the two sets of scenarios among humans. Surprisingly, even these differences were replicated in LLM responses. Study 2 turned to a contextual feature of human rule application: under forced time delay, human samples rely more heavily on a rule's text than on other considerations such as a rule's purpose.. Our results revealed that some models (Gemini Pro and Claude 3) responded in a human-like manner to a prompt describing either forced delay or time pressure, while others (GPT-4o and Llama 3.2 90b) did not. We argue that the evidence gathered suggests that LLMs have mastery over the concept of rule, with implications for both legal decision making and philosophical inquiry.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Iowa (0.04)
- (6 more...)
- Research Report > New Finding (0.34)
- Research Report > Promising Solution (0.34)
VEL: A Formally Verified Reasoner for OWL2 EL Profile
Ileri, Atalay Mert, Rangarajan, Nalen, Cannell, Jack, McGinty, Hande
Over the past two decades, the Web Ontology Language (OWL) has been instrumental in advancing the development of ontologies and knowledge graphs, providing a structured framework that enhances the semantic integration of data. However, the reliability of deductive reasoning within these systems remains challenging, as evidenced by inconsistencies among popular reasoners in recent competitions. This evidence underscores the limitations of current testing-based methodologies, particularly in high-stakes domains such as healthcare. To mitigate these issues, in this paper, we have developed VEL, a formally verified EL++ reasoner equipped with machine-checkable correctness proofs that ensure the validity of outputs across all possible inputs. This formalization, based on the algorithm of Baader et al., has been transformed into executable OCaml code using the Coq proof assistant's extraction capabilities. Our formalization revealed several errors in the original completeness proofs, which led to changes to the algorithm to ensure its completeness. Our work demonstrates the necessity of mechanization of reasoning algorithms to ensure their correctness at theoretical and implementation levels.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > New York (0.04)
- North America > United States > Kansas (0.04)
- (2 more...)
Symbolic Working Memory Enhances Language Models for Complex Rule Application
Wang, Siyuan, Wei, Zhongyu, Choi, Yejin, Ren, Xiang
Large Language Models (LLMs) have shown remarkable reasoning performance but struggle with multi-step deductive reasoning involving a series of rule application steps, especially when rules are presented non-sequentially. Our preliminary analysis shows that while LLMs excel in single-step rule application, their performance drops significantly in multi-step scenarios due to the challenge in rule grounding. It requires anchoring the applicable rule and supporting facts at each step, amidst multiple input rules, facts, and inferred facts. To address this, we propose augmenting LLMs with external working memory and introduce a neurosymbolic framework for rule application. The memory stores facts and rules in both natural language and symbolic forms, enabling precise tracking. Utilizing this memory, our framework iteratively performs symbolic rule grounding and LLM-based rule implementation. The former matches predicates and variables of symbolic rules and facts to ground applicable rules at each step. Experiments indicate our framework's effectiveness in rule application and its robustness across various steps and settings~\footnote{Code and data are available at \url{https://github.com/SiyuanWangw/RuleApplication}.}.
- North America > United States > California (0.14)
- Europe > France (0.04)
- Workflow (0.87)
- Research Report (0.64)
- Law (0.93)
- Health & Medicine (0.87)
Why Not? Explaining Missing Entailments with Evee (Technical Report)
Alrabbaa, Christian, Borgwardt, Stefan, Friese, Tom, Koopmann, Patrick, Kotlov, Mikhail
We present a Protégé plugin for explaining missing entailments from OWL ontologies. The importance of explaining description logic reasoning to end-users has long been understood, and has been studied in many forms over the past decades. Indeed, explainability is one of the main advantages of logic-based knowledge representations over sub-symbolic methods. The first approaches to explain why a consequence follows from a Description Logic (DL) ontology were based on step-by-step proofs [8, 18], but soon research focused on justifications [7, 11, 20] that are easier to compute, but still very useful for pointing out the axioms responsible for an entailment. Consequently, the ontology editor Protégé supports black-box methods for computing justifications for arbitrary OWL DL ontologies [12]. More recently, a series of papers investigated different methods of computing good proofs for entailments in DLs ranging from EL to ALCOI [13, 1, 2, 3], and the Protégé plug-ins proof-explanation [13] and Evee [4], as well as the web-based application Evonne [19], were developed to make these algorithms available to ontology engineers. While reasoning can sometimes reveal unexpected entailments that need explaining, very often the problem is not what is entailed, but what is not entailed. In order to explain such missing entailments, and offer suggestions on how to repair them, both counterexamples and abduction have been suggested in the literature.
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > United Kingdom > England > Greater Manchester > Manchester (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
Pushing the Boundaries of Tractable Multiperspective Reasoning: A Deduction Calculus for Standpoint EL+
Álvarez, Lucía Gómez, Rudolph, Sebastian, Strass, Hannes
Standpoint EL is a multi-modal extension of the popular description logic EL that allows for the integrated representation of domain knowledge relative to diverse standpoints or perspectives. Advantageously, its satisfiability problem has recently been shown to be in PTime, making it a promising framework for large-scale knowledge integration. In this paper, we show that we can further push the expressivity of this formalism, arriving at an extended logic, called Standpoint EL+, which allows for axiom negation, role chain axioms, self-loops, and other features, while maintaining tractability. This is achieved by designing a satisfiability-checking deduction calculus, which at the same time addresses the need for practical algorithms. We demonstrate the feasibility of our calculus by presenting a prototypical Datalog implementation of its deduction rules.
- Asia > Middle East > Jordan (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Serbia > Central Serbia > Belgrade (0.04)
- (2 more...)
Genetic Algorithm for Program Synthesis
A deductive program synthesis tool takes a specification as input and derives a program that satisfies the specification. The drawback of this approach is that search spaces for such correct programs tend to be enormous, making it difficult to derive correct programs within a realistic timeout. To speed up such program derivation, we improve the search strategy of a deductive program synthesis tool, SuSLik, using evolutionary computation. Our cross-validation shows that the improvement brought by evolutionary computation generalises to unforeseen problems.
- North America > Canada > Quebec > Montreal (0.05)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- (7 more...)
- Instructional Material > Course Syllabus & Notes (0.49)
- Research Report (0.40)