Goto

Collaborating Authors

 logic form


GeoSketch: A Neural-Symbolic Approach to Geometric Multimodal Reasoning with Auxiliary Line Construction and Affine Transformation

Weng, Shichao, Wang, Zhiqiang, Zhou, Yuhua, Lu, Rui, Liu, Ting, Teng, Zhiyang, Liu, Xiaozhang, Liu, Hanmeng

arXiv.org Artificial Intelligence

Geometric Problem Solving (GPS) poses a unique challenge for Multimodal Large Language Models (MLLMs), requiring not only the joint interpretation of text and diagrams but also iterative visuospatial reasoning. While existing approaches process diagrams as static images, they lack the capacity for dynamic manipulation--a core aspect of human geometric reasoning involving auxiliary line construction and affine transformations. GeoSketch integrates: (1) a Perception module that abstracts diagrams into structured logic forms, (2) a Symbolic Reasoning module that applies geometric theorems to decide the next deductive step, and (3) a Sketch Action module that executes operations such as drawing auxiliary lines or applying transformations, thereby updating the diagram in a closed loop. To train this agent, we develop a two-stage pipeline: supervised fine-tuning on 2,000 symbolic-curated trajectories followed by reinforcement learning with dense, symbolic rewards to enhance robustness and strategic exploration. To evaluate this paradigm, we introduce the GeoSketch Benchmark, a high-quality set of 390 geometry problems requiring auxiliary construction or affine transformations. Experiments on strong MLLM baselines demonstrate that GeoSketch significantly improves stepwise reasoning accuracy and problem-solving success over static perception methods. Work done during an internship at Hainan University. With the advent of Multimodal Large Language Models (MLLMs) (OpenAI, 2024; Comanici et al., 2025; Hong et al., 2025), Geometric Problem Solving (GPS) presents a unique challenge to MLLMs, demanding not only the joint understanding of text and diagrams but also rigorous, multi-step deductive reasoning (Zhang et al., 2023; Qiao et al., 2024; He et al., 2025). While modern MLLMs can ingest multimodal inputs, their reasoning output remains confined to static text. This limits the use of dynamic and visuospatial strategies in geometric problem solving, which becomes particularly evident in complex tasks requiring multi-stage manipulation.


Code-Style In-Context Learning for Knowledge-Based Question Answering

Nie, Zhijie, Zhang, Richong, Wang, Zhongyuan, Liu, Xudong

arXiv.org Artificial Intelligence

Current methods for Knowledge-Based Question Answering (KBQA) usually rely on complex training techniques and model frameworks, leading to many limitations in practical applications. Recently, the emergence of In-Context Learning (ICL) capabilities in Large Language Models (LLMs) provides a simple and training-free semantic parsing paradigm for KBQA: Given a small number of questions and their labeled logical forms as demo examples, LLMs can understand the task intent and generate the logic form for a new question. However, current powerful LLMs have little exposure to logic forms during pre-training, resulting in a high format error rate. To solve this problem, we propose a code-style in-context learning method for KBQA, which converts the generation process of unfamiliar logical form into the more familiar code generation process for LLMs. Experimental results on three mainstream datasets show that our method dramatically mitigated the formatting error problem in generating logic forms while realizing a new SOTA on WebQSP, GrailQA, and GraphQ under the few-shot setting. The code and supplementary files are released at https://github.com/


CoF-CoT: Enhancing Large Language Models with Coarse-to-Fine Chain-of-Thought Prompting for Multi-domain NLU Tasks

Nguyen, Hoang H., Liu, Ye, Zhang, Chenwei, Zhang, Tao, Yu, Philip S.

arXiv.org Artificial Intelligence

While Chain-of-Thought prompting is popular in reasoning tasks, its application to Large Language Models (LLMs) in Natural Language Understanding (NLU) is under-explored. Motivated by multi-step reasoning of LLMs, we propose Coarse-to-Fine Chain-of-Thought (CoF-CoT) approach that breaks down NLU tasks into multiple reasoning steps where LLMs can learn to acquire and leverage essential concepts to solve tasks from different granularities. Moreover, we propose leveraging semantic-based Abstract Meaning Representation (AMR) structured knowledge as an intermediate step to capture the nuances and diverse structures of utterances, and to understand connections between their varying levels of granularity. Our proposed approach is demonstrated effective in assisting the LLMs adapt to the multi-grained NLU tasks under both zero-shot and few-shot multi-domain settings.


LoFT: Enhancing Faithfulness and Diversity for Table-to-Text Generation via Logic Form Control

Zhao, Yilun, Qi, Zhenting, Nan, Linyong, Flores, Lorenzo Jaime Yu, Radev, Dragomir

arXiv.org Artificial Intelligence

Logical Table-to-Text (LT2T) generation is tasked with generating logically faithful sentences from tables. There currently exists two challenges in the field: 1) Faithfulness: how to generate sentences that are factually correct given the table content; 2) Diversity: how to generate multiple sentences that offer different perspectives on the table. This work proposes LoFT, which utilizes logic forms as fact verifiers and content planners to control LT2T generation. Experimental results on the LogicNLG dataset demonstrate that LoFT is the first model that addresses unfaithfulness and lack of diversity issues simultaneously. Our code is publicly available at https://github.com/Yale-LILY/LoFT.


Complex Knowledge Base Question Answering: A Survey

Lan, Yunshi, He, Gaole, Jiang, Jinhao, Jiang, Jing, Zhao, Wayne Xin, Wen, Ji-Rong

arXiv.org Artificial Intelligence

Knowledge base question answering (KBQA) aims to answer a question over a knowledge base (KB). Early studies mainly focused on answering simple questions over KBs and achieved great success. However, their performance on complex questions is still far from satisfactory. Therefore, in recent years, researchers propose a large number of novel methods, which looked into the challenges of answering complex questions. In this survey, we review recent advances on KBQA with the focus on solving complex questions, which usually contain multiple subjects, express compound relations, or involve numerical operations. In detail, we begin with introducing the complex KBQA task and relevant background. Then, we describe benchmark datasets for complex KBQA task and introduce the construction process of these datasets. Next, we present two mainstream categories of methods for complex KBQA, namely semantic parsing-based (SP-based) methods and information retrieval-based (IR-based) methods. Specifically, we illustrate their procedures with flow designs and discuss their major differences and similarities. After that, we summarize the challenges that these two categories of methods encounter when answering complex questions, and explicate advanced solutions and techniques used in existing work. Finally, we conclude and discuss several promising directions related to complex KBQA for future research.


Two minutes NLP -- Quick Intro to Knowledge Base Question Answering

#artificialintelligence

Knowledge base question answering (KBQA) aims to answer a natural language question over a knowledge base (KB) as its knowledge source. A knowledge base (KB) is a structured database that contains a collection of facts in the form subject, relation, object, where each fact can have properties attached called qualifiers. For example, the sentence "Barack Obama got married to Michelle Obama on 3 October 1992 at Trinity United Church" can be represented by the tuple Barack Obama, Spouse, Michelle Obama, with the qualifiers start time 3 October 1992 and place of marriage Trinity United Church . Popular knowledge bases are DBpedia and WikiData. Early works on KBQA focused on simple question answering, where there's only a single fact involved.


A Hybrid Semantic Parsing Approach for Tabular Data Analysis

Gao, Yan, Lou, Jian-Guang, Zhang, Dongmei

arXiv.org Artificial Intelligence

This paper presents a novel approach to translating natural language questions to SQL queries for given tables, which meets three requirements as a real-world data analysis application: cross-domain, multilingualism and enabling quick-start. Our proposed approach consists of: (1) a novel data abstraction step before the parser to make parsing table-agnosticism; (2) a set of semantic rules for parsing abstracted data-analysis questions to intermediate logic forms as tree derivations to reduce the search space; (3) a neural-based model as a local scoring function on a span-based semantic parser for structured optimization and efficient inference. Experiments show that our approach outperforms state-of-the-art algorithms on a large open benchmark dataset WikiSQL. We also achieve promising results on a small dataset for more complex queries in both English and Chinese, which demonstrates our language expansion and quick-start ability.


An ASP Methodology for Understanding Narratives about Stereotypical Activities

Inclezan, Daniela, Zhang, Qinglin, Balduccini, Marcello, Israney, Ankush

arXiv.org Artificial Intelligence

We describe an application of Answer Set Programming to the understanding of narratives about stereotypical activities, demonstrated via question answering. Substantial work in this direction was done by Erik Mueller, who modeled stereotypical activities as scripts. His systems were able to understand a good number of narratives, but could not process texts describing exceptional scenarios. We propose addressing this problem by using a theory of intentions developed by Blount, Gelfond, and Balduccini. We present a methodology in which we substitute scripts by activities (i.e., hierarchical plans associated with goals) and employ the concept of an intentional agent to reason about both normal and exceptional scenarios. We exemplify the application of this methodology by answering questions about a number of restaurant stories. This paper is under consideration for acceptance in TPLP.


Machine Learning with World Knowledge: The Position and Survey

Song, Yangqiu, Roth, Dan

arXiv.org Machine Learning

Machine learning has become pervasive in multiple domains, impacting a wide variety of applications, such as knowledge discovery and data mining, natural language processing, information retrieval, computer vision, social and health informatics, ubiquitous computing, etc. Two essential problems of machine learning are how to generate features and how to acquire labels for machines to learn. Particularly, labeling large amount of data for each domain-specific problem can be very time consuming and costly. It has become a key obstacle in making learning protocols realistic in applications. In this paper, we will discuss how to use the existing general-purpose world knowledge to enhance machine learning processes, by enriching the features or reducing the labeling work. We start from the comparison of world knowledge with domain-specific knowledge, and then introduce three key problems in using world knowledge in learning processes, i.e., explicit and implicit feature representation, inference for knowledge linking and disambiguation, and learning with direct or indirect supervision. Finally we discuss the future directions of this research topic.


Mapping Syntactic to Semantic Generalizations of Linguistic Parse Trees

Galitsky, Boris Lluis de la (University of Girona) | Rose, Josep Lluis Lluis de la de la (University of Girona) | Dobrocsi, Gabor Lluis de la (University of Miskolc Miskolc)

AAAI Conferences

We define sentence generalization and generalization diagrams as a special case of least general generalization (LGG) as applied to linguistic parse trees. Similarity measure between linguistic parse trees is developed as LGG operation on the lists of sub-trees of these trees. The diagrams introduced are representation of mapping between the syntactic generalization level and semantic generalization level. Generalization diagrams are intended as a framework to compute semantic similarity between texts relying on linguistic parse tree data. Such structured approach significantly improves text relevance assessment in a horizontal domain, where ontologies are not available