Goto

Collaborating Authors

 tweety


Can LLMs Solve ASP Problems? Insights from a Benchmarking Study (Extended Version)

arXiv.org Artificial Intelligence

Answer Set Programming (ASP) is a powerful paradigm for non-monotonic reasoning. Recently, large language models (LLMs) have demonstrated promising capabilities in logical reasoning. Despite this potential, current evaluations of LLM capabilities in ASP are often limited. Existing works normally employ overly simplified ASP programs, do not support negation, disjunction, or multiple answer sets. Furthermore, there is a lack of benchmarks that introduce tasks specifically designed for ASP solving. To bridge this gap, we introduce ASPBench, a comprehensive ASP benchmark, including three ASP specific tasks: ASP entailment, answer set verification, and answer set computation. Our extensive evaluations on ASPBench reveal that while 14 state-of-the-art LLMs, including \emph{deepseek-r1}, \emph{o4-mini}, and \emph{gemini-2.5-flash-thinking}, perform relatively well on the first two simpler tasks, they struggle with answer set computation, which is the core of ASP solving. These findings offer insights into the current limitations of LLMs in ASP solving. This highlights the need for new approaches that integrate symbolic reasoning capabilities more effectively. The code and dataset are available at https://github.com/HomuraT/ASPBench.


Thimm

AAAI Conferences

This paper presents Tweety, an open source project for scientific experimentation on logical aspects of artificial intelligence and particularly knowledge representation. Tweety provides a general framework for implementing and testing knowledge representation formalisms in a way that is familiar to researchers used to logical formalizations. This framework is very general, widely applicable, and can be used to implement a variety of knowledge representation formalisms from classical logics, over logic programming and computational models for argumentation, to probabilistic modeling approaches. Tweety already contains over 15 different knowledge representation formalisms and allows easy computation of examples, comparison of algorithms and approaches, and benchmark tests. This paper gives an overview on the technical architecture of Tweety and a description of its different libraries. We also provide two case studies that show how Tweety can be used for empirical evaluation of different problems in artificial intelligence.


Signature Entrenchment and Conceptual Changes in Automated Theory Repair

arXiv.org Artificial Intelligence

Human beliefs change, but so do the concepts that underpin them. The recent Abduction, Belief Revision and Conceptual Change (ABC) repair system combines several methods from automated theory repair to expand, contract, or reform logical structures representing conceptual knowledge in artificial agents. In this paper we focus on conceptual change: repair not only of the membership of logical concepts, such as what animals can fly, but also concepts themselves, such that birds may be divided into flightless and flying birds, by changing the signature of the logical theory used to represent them. We offer a method for automatically evaluating entrenchment in the signature of a Datalog theory, in order to constrain automated theory repair to succinct and intuitive outcomes. Formally, signature entrenchment measures the inferential contributions of every logical language element used to express conceptual knowledge, i.e., predicates and the arguments, ranking possible repairs to retain valuable logical concepts and reject redundant or implausible alternatives. This quantitative measurement of signature entrenchment offers a guide to the plausibility of conceptual changes, which we aim to contrast with human judgements of concept entrenchment in future work.


Approximating Defeasible Logics to Improve Scalability

arXiv.org Artificial Intelligence

Defeasible rules are used in providing computable representations of legal documents and, more recently, have been suggested as a basis for explainable AI. Such applications draw attention to the scalability of implementations. The defeasible logic $DL(\partial_{||})$ was introduced as a more scalable alternative to $DL(\partial)$, which is better known. In this paper we consider the use of (implementations of) $DL(\partial_{||})$ as a computational aid to computing conclusions in $DL(\partial)$ and other defeasible logics, rather than as an alternative to $DL(\partial)$. We identify conditions under which $DL(\partial_{||})$ can be substituted for $DL(\partial)$ with no change to the conclusions drawn, and conditions under which $DL(\partial_{||})$ can be used to draw some valid conclusions, leaving the remainder to be drawn by $DL(\partial)$.


Defeasible Reasoning via Datalog$^\neg$

arXiv.org Artificial Intelligence

Hardware architectures can range from the use of GPUs and other hardware accelerators, through multi-core multi-threaded architectures, to shared-nothing cloud computing. Causes for failure to exploit these architectures include lack of expertise in the architectural features, lack of manpower more generally, and difficulty in updating legacy systems. Such problems can be ameliorated by mapping a logic to logic programming as an intermediate language. This is a common strategy in the implementation of defeasible logics. The first implementation of a defeasible logic, d-Prolog, was implemented as a Prolog meta-interpreter (Covington et al. 1997). Courteous Logic Programs (Grosof 1997) and its successors LPDA (Wan et al. 2009), Rulelog (Grosof and Kifer 2013), Flora2 (Kifer et al. 2018), are implemented in XSB (Swift and Warren 2012).


Extending Automated Deduction for Commonsense Reasoning

arXiv.org Artificial Intelligence

Commonsense reasoning has long been considered as one of the holy grails of artificial intelligence. Most of the recent progress in the field has been achieved by novel machine learning algorithms for natural language processing. However, without incorporating logical reasoning, these algorithms remain arguably shallow. With some notable exceptions, developers of practical automated logic-based reasoners have mostly avoided focusing on the problem. The paper argues that the methods and algorithms used by existing automated reasoners for classical first-order logic can be extended towards commonsense reasoning. Instead of devising new specialized logics we propose a framework of extensions to the mainstream resolution-based search methods to make these capable of performing search tasks for practical commonsense reasoning with reasonable efficiency. The proposed extensions mostly rely on operating on ordinary proof trees and are devised to handle commonsense knowledge bases containing inconsistencies, default rules, taxonomies, topics, relevance, confidence and similarity measures. We claim that machine learning is best suited for the construction of commonsense knowledge bases while the extended logic-based methods would be well-suited for actually answering queries from these knowledge bases.


Non-monotonic Reasoning in Deductive Argumentation

arXiv.org Artificial Intelligence

Argumentation is a non-monotonic process. This reflects the fact that argumentation involves uncertain information, and so new information can cause a change in the conclusions drawn. However, the base logic does not need to be non-monotonic. Indeed, most proposals for structured argumentation use a monotonic base logic (e.g. some form of modus ponens with a rule-based language, or classical logic). Nonetheless, there are issues in capturing defeasible reasoning in argumentation including choice of base logic and modelling of defeasible knowledge. And there are insights and tools to be harnessed for research in non-monontonic logics. We consider some of these issues in this paper.


Stream Reasoning on Expressive Logics

arXiv.org Artificial Intelligence

Data streams occur widely in various real world applications. The research on streaming data mainly focuses on the data management, query evaluation and optimization on these data, however the work on reasoning procedures for streaming knowledge bases on both the assertional and terminological levels is very limited. Typically reasoning services on large knowledge bases are very expensive, and need to be applied continuously when the data is received as a stream. Hence new techniques for optimizing this continuous process is needed for developing efficient reasoners on streaming data. In this paper, we survey the related research on reasoning on expressive logics that can be applied to this setting, and point to further research directions in this area.


Preorder-Based Triangle: A Modified Version of Bilattice-Based Triangle for Belief Revision in Nonmonotonic Reasoning

arXiv.org Artificial Intelligence

Bilattice-based triangle provides an elegant algebraic structure for reasoning with vague and uncertain information. But the truth and knowledge ordering of intervals in bilattice-based triangle can not handle repetitive belief revisions which is an essential characteristic of nonmonotonic reasoning. Moreover the ordering induced over the intervals by the bilattice-based triangle is not sometimes intuitive. In this work, we construct an alternative algebraic structure, namely preorder-based triangle and we formulate proper logical connectives for this. It is also demonstrated that Preorder-based triangle serves to be a better alternative to the bilattice-based triangle for reasoning in application areas, that involve nonmonotonic fuzzy reasoning with uncertain information.


How to Make AI Forget

#artificialintelligence

We all know what it's like to forget something. Even people capable of extraordinary memory feats – say, memorising the order of a deck of cards in less than 20 seconds – will still forget where they left their keys. People, it seems, are never in complete control of their memories. Forgetting is a tricky business, both for humans and for artificial intelligence (AI), and researchers are exploring the idea of robot memory in many different ways. This raises not only technical issues, but concerns related to privacy, law and ethics.