barbara
Can LLMs Solve ASP Problems? Insights from a Benchmarking Study (Extended Version)
Ren, Lin, Xiao, Guohui, Qi, Guilin, Geng, Yishuai, Xue, Haohan
Answer Set Programming (ASP) is a powerful paradigm for non-monotonic reasoning. Recently, large language models (LLMs) have demonstrated promising capabilities in logical reasoning. Despite this potential, current evaluations of LLM capabilities in ASP are often limited. Existing works normally employ overly simplified ASP programs, do not support negation, disjunction, or multiple answer sets. Furthermore, there is a lack of benchmarks that introduce tasks specifically designed for ASP solving. To bridge this gap, we introduce ASPBench, a comprehensive ASP benchmark, including three ASP specific tasks: ASP entailment, answer set verification, and answer set computation. Our extensive evaluations on ASPBench reveal that while 14 state-of-the-art LLMs, including \emph{deepseek-r1}, \emph{o4-mini}, and \emph{gemini-2.5-flash-thinking}, perform relatively well on the first two simpler tasks, they struggle with answer set computation, which is the core of ASP solving. These findings offer insights into the current limitations of LLMs in ASP solving. This highlights the need for new approaches that integrate symbolic reasoning capabilities more effectively. The code and dataset are available at https://github.com/HomuraT/ASPBench.
OneEval: Benchmarking LLM Knowledge-intensive Reasoning over Diverse Knowledge Bases
Chen, Yongrui, Liu, Zhiqiang, Yu, Jing, Ren, Lin, Hu, Nan, Dai, Xinbang, Liu, Jiajun, Kang, Jiazhen, Zhang, Shenyu, Wang, Xinda, Ding, Keyan, Shen, Pengfei, Zhu, Haolei, Deng, Hongjie, Wang, Yisong, Wu, Tongtong, Bi, Sheng, Zhang, Wen, Wu, Tianxing, Ji, Qiu, Wang, Haofen, Chen, Wenliang, Chen, Huajun, Qi, Guilin
Large Language Models (LLMs) have demonstrated substantial progress on reasoning tasks involving unstructured text, yet their capabilities significantly deteriorate when reasoning requires integrating structured external knowledge such as knowledge graphs, code snippets, or formal logic. This limitation is partly due to the absence of benchmarks capable of systematically evaluating LLM performance across diverse structured knowledge modalities. To address this gap, we introduce \textbf{\textsc{OneEval}}, a comprehensive benchmark explicitly designed to assess the knowledge-intensive reasoning capabilities of LLMs across four structured knowledge modalities, unstructured text, knowledge graphs, code, and formal logic, and five critical domains (general knowledge, government, science, law, and programming). \textsc{OneEval} comprises 4,019 carefully curated instances and includes a challenging subset, \textsc{OneEval}\textsubscript{Hard}, consisting of 1,285 particularly difficult cases. Through extensive evaluation of 18 state-of-the-art open-source and proprietary LLMs, we establish three core findings: a) \emph{persistent limitations in structured reasoning}, with even the strongest model achieving only 32.2\% accuracy on \textsc{OneEval}\textsubscript{Hard}; b) \emph{performance consistently declines as the structural complexity of the knowledge base increases}, with accuracy dropping sharply from 53\% (textual reasoning) to 25\% (formal logic); and c) \emph{diminishing returns from extended reasoning chains}, highlighting the critical need for models to adapt reasoning depth appropriately to task complexity. We release the \textsc{OneEval} datasets, evaluation scripts, and baseline results publicly, accompanied by a leaderboard to facilitate ongoing advancements in structured knowledge reasoning.
Magic Markup: Maintaining Document-External Markup with an LLM
Misback, Edward, Tatlock, Zachary, Tanimoto, Steven L.
Text documents, including programs, typically have human-readable semantic structure. Historically, programmatic access to these semantics has required explicit in-document tagging. Especially in systems where the text has an execution semantics, this means it is an opt-in feature that is hard to support properly. Today, language models offer a new method: metadata can be bound to entities in changing text using a model's human-like understanding of semantics, with no requirements on the document structure. This method expands the applications of document annotation, a fundamental operation in program writing, debugging, maintenance, and presentation. We contribute a system that employs an intelligent agent to re-tag modified programs, enabling rich annotations to automatically follow code as it evolves. We also contribute a formal problem definition, an empirical synthetic benchmark suite, and our benchmark generator. Our system achieves an accuracy of 90% on our benchmarks and can replace a document's tags in parallel at a rate of 5 seconds per tag. While there remains significant room for improvement, we find performance reliable enough to justify further exploration of applications.
Epistemic Syllogistic: First Steps
Although modal logic is regarded as a relatively young field, its origins can be traced back to Aristotle, who explored syllogistic reasoning patterns that incorporated modalities. However, in contrast to his utterly successful assertoric syllogistic, Aristotle's examination of modal syllogisms is often viewed as error-prone and controversial, thus receiving less attention from logicians. In the literature, a large body of research on Aristotle's modal syllogistic primarily centers on the possibility of a coherent interpretation of his proposed modal systems grounded by his philosophy on necessity and contingency (see, e.g., [11, 5, 12]). We adopt a more liberal view on Aristotle's modal syllogistic, considering it as a source of inspiration for formalizing natural reasoning patterns involving modalities, rather than scrutinizing the coherence of the original systems. Our approach is encouraged by the fruitful research program of natural logic, which explores "light" logic systems that admit intuitive reasoning patterns in natural languages while balancing expressivity and computational complexity [1, 8]. In particular, various extensions of the assertoric syllogistic have been proposed and studied [8]. In this paper, we propose a systematic study on epistemic syllogistic to initiate our technical investigations of (extensions of) modal syllogistic. The choice for the epistemic modality is intentional for its ubiquitous use in natural languages. Consider the following syllogism: All C are B Some C is known to be A Some B is known to be A Taking the intuitive de re reading, the second premise and the conclusion above can be formalized as x(Cx KAx) and x(Bx KAx) respectively in first-order modal logic (FOML).
Neuro-Symbolic AI for Compliance Checking of Electrical Control Panels
Barbara, Vito, Guarascio, Massimo, Leone, Nicola, Manco, Giuseppe, Quarta, Alessandro, Ricca, Francesco, Ritacco, Ettore
Artificial Intelligence plays a main role in supporting and improving smart manufacturing and Industry 4.0, by enabling the automation of different types of tasks manually performed by domain experts. In particular, assessing the compliance of a product with the relative schematic is a time-consuming and prone-to-error process. In this paper, we address this problem in a specific industrial scenario. In particular, we define a Neuro-Symbolic approach for automating the compliance verification of the electrical control panels. Our approach is based on the combination of Deep Learning techniques with Answer Set Programming (ASP), and allows for identifying possible anomalies and errors in the final product even when a very limited amount of training data is available. The experiments conducted on a real test case provided by an Italian Company operating in electrical control panel production demonstrate the effectiveness of the proposed approach.
Barbara de Souza on LinkedIn: #ai #chatgpt #therightchoice #polkproperties
We are often blind to the truth due to our preconceived notions and beliefs. "We only see what we want to see; we only hear what we want to hear. Our belief system is just like a mirror that only shows us what we believe." This can be seen in a variety of scenarios in everyday life. For example, if a person holds a negative view on a particular topic, they are likely to only pay attention to information that confirms those beliefs while discounting any evidence that may contradict it.
Exploring the Landscape of Relational Syllogistic Logics
Kruckman, Alex, Moss, Lawrence S.
This paper explores relational syllogistic logics, a family of logical systems related to reasoning about relations in extensions of the classical syllogistic. These are all decidable logical systems. We prove completeness theorems and complexity results for a natural subfamily of relational syllogistic logics, parametrized by constructors for terms and for sentences.
Machine Learning with R – Barbara Fusinska
Barbara started by introducing machine learning (ML), gave a brief overview of R and then discussed three examples; classifying hand written digits, estimating values in a socio-economic dataset and clustering crimes in Chicago. ML is statistics in steroids. ML uses data to find that pattern then uses that pattern (model) to predict results from similar data. Barbra uses the example of classifying film genres into either action or romance based on the number of kicks and kisses. Barbara described supervised and unsupervised. Unsupervised is the "wild, wild west" we can't train the model and it is much more difficult to understand how effective these are. Back to supervised learning, it's important to choose good predicting factors – in the movie example perhaps the title, actors, script may have been better predictors that the number of kicks and kisses. Then you must choose the algorithm and then tune it and finally make it useful and visible and get it into production - it's a hard job especially when data scientists and software developer seem to be different tribes.
Conference Features
The 1991 National Conference on Artificial Intelligence (AAAI-91) saw some changes in the presentation format of previous years. This section outlines what these changes were and generally summarizes some of the invited talks and panels. This year's conference featured a new format designed to encourage greater interaction among attendees with similar interests. The conference was organized around specialized forums, each emphasizing a different set of coordinated topics. Each forum featured a schedule of presentations, meet-the-author sessions, and panels united by a set of related research issues.