Goto

Collaborating Authors

 Uceda-Sosa, Rosario


Cross-Examiner: Evaluating Consistency of Large Language Model-Generated Explanations

arXiv.org Artificial Intelligence

Large Language Models (LLMs) are often asked to explain their outputs to enhance accuracy and transparency. However, evidence suggests that these explanations can misrepresent the models' true reasoning processes. One effective way to identify inaccuracies or omissions in these explanations is through consistency checking, which typically involves asking follow-up questions. This paper introduces, cross-examiner, a new method for generating follow-up questions based on a model's explanation of an initial question. Our method combines symbolic information extraction with language model-driven question generation, resulting in better follow-up questions than those produced by LLMs alone. Additionally, this approach is more flexible than other methods and can generate a wider variety of follow-up questions.


Few-shot Policy (de)composition in Conversational Question Answering

arXiv.org Artificial Intelligence

The task of policy compliance detection (PCD) is to determine if a scenario is in compliance with respect to a set of written policies. In a conversational setting, the results of PCD can indicate if clarifying questions must be asked to determine compliance status. Existing approaches usually claim to have reasoning capabilities that are latent or require a large amount of annotated data. In this work, we propose logical decomposition for policy compliance (LDPC): a neuro-symbolic framework to detect policy compliance using large language models (LLMs) in a few-shot setting. By selecting only a few exemplars alongside recently developed prompting techniques, we demonstrate that our approach soundly reasons about policy compliance conversations by extracting sub-questions to be answered, assigning truth values from contextual information, and explicitly producing a set of logic statements from the given policies. The formulation of explicit logic graphs can in turn help answer PCDrelated questions with increased transparency and explainability. We apply this approach to the popular PCD and conversational machine reading benchmark, ShARC, and show competitive performance with no task-specific finetuning. We also leverage the inherently interpretable architecture of LDPC to understand where errors occur, revealing ambiguities in the ShARC dataset and highlighting the challenges involved with reasoning for conversational question answering.


Reasoning about concepts with LLMs: Inconsistencies abound

arXiv.org Artificial Intelligence

The ability to summarize and organize knowledge into abstract concepts is key to learning and reasoning. Many industrial applications rely on the consistent and systematic use of concepts, especially when dealing with decision-critical knowledge. However, we demonstrate that, when methodically questioned, large language models (LLMs) often display and demonstrate significant inconsistencies in their knowledge. Computationally, the basic aspects of the conceptualization of a given domain can be represented as Is-A hierarchies in a knowledge graph (KG) or ontology, together with a few properties or axioms that enable straightforward reasoning. We show that even simple ontologies can be used to reveal conceptual inconsistencies across several LLMs. We also propose strategies that domain experts can use to evaluate and improve the coverage of key domain concepts in LLMs of various sizes. In particular, we have been able to significantly enhance the performance of LLMs of various sizes with openly available weights using simple knowledge-graph (KG) based prompting strategies.


Learning Symbolic Rules over Abstract Meaning Representations for Textual Reinforcement Learning

arXiv.org Artificial Intelligence

Text-based reinforcement learning agents have predominantly been neural network-based models with embeddings-based representation, learning uninterpretable policies that often do not generalize well to unseen games. On the other hand, neuro-symbolic methods, specifically those that leverage an intermediate formal representation, are gaining significant attention in language understanding tasks. This is because of their advantages ranging from inherent interpretability, the lesser requirement of training data, and being generalizable in scenarios with unseen data. Therefore, in this paper, we propose a modular, NEuro-Symbolic Textual Agent (NESTA) that combines a generic semantic parser with a rule induction system to learn abstract interpretable rules as policies. Our experiments on established text-based game benchmarks show that the proposed NESTA method outperforms deep reinforcement learning-based techniques by achieving better generalization to unseen test games and learning from fewer training interactions.


A Benchmark for Generalizable and Interpretable Temporal Question Answering over Knowledge Bases

arXiv.org Artificial Intelligence

Knowledge Base Question Answering (KBQA) tasks that involve complex reasoning are emerging as an important research direction. However, most existing KBQA datasets focus primarily on generic multi-hop reasoning over explicit facts, largely ignoring other reasoning types such as temporal, spatial, and taxonomic reasoning. In this paper, we present a benchmark dataset for temporal reasoning, TempQA-WD, to encourage research in extending the present approaches to target a more challenging set of complex reasoning tasks. Specifically, our benchmark is a temporal question answering dataset with the following advantages: (a) it is based on Wikidata, which is the most frequently curated, openly available knowledge base, (b) it includes intermediate sparql queries to facilitate the evaluation of semantic parsing based approaches for KBQA, and (c) it generalizes to multiple knowledge bases: Freebase and Wikidata. The TempQA-WD dataset is available at https://github.com/IBM/tempqa-wd.


SYGMA: System for Generalizable Modular Question Answering OverKnowledge Bases

arXiv.org Artificial Intelligence

Knowledge Base Question Answering (KBQA) tasks that in-volve complex reasoning are emerging as an important re-search direction. However, most KBQA systems struggle withgeneralizability, particularly on two dimensions: (a) acrossmultiple reasoning types where both datasets and systems haveprimarily focused on multi-hop reasoning, and (b) across mul-tiple knowledge bases, where KBQA approaches are specif-ically tuned to a single knowledge base. In this paper, wepresent SYGMA, a modular approach facilitating general-izability across multiple knowledge bases and multiple rea-soning types. Specifically, SYGMA contains three high levelmodules: 1) KB-agnostic question understanding module thatis common across KBs 2) Rules to support additional reason-ing types and 3) KB-specific question mapping and answeringmodule to address the KB-specific aspects of the answer ex-traction. We demonstrate effectiveness of our system by evalu-ating on datasets belonging to two distinct knowledge bases,DBpedia and Wikidata. In addition, to demonstrate extensi-bility to additional reasoning types we evaluate on multi-hopreasoning datasets and a new Temporal KBQA benchmarkdataset on Wikidata, namedTempQA-WD1, introduced in thispaper. We show that our generalizable approach has bettercompetetive performance on multiple datasets on DBpediaand Wikidata that requires both multi-hop and temporal rea-soning


Reports of the AAAI 2014 Conference Workshops

AI Magazine

The AAAI-14 Workshop program was held Sunday and Monday, July 27–28, 2012, at the Québec City Convention Centre in Québec, Canada. The AAAI-14 workshop program included fifteen workshops covering a wide range of topics in artificial intelligence. The titles of the workshops were AI and Robotics; Artificial Intelligence Applied to Assistive Technologies and Smart Environments; Cognitive Computing for Augmented Human Intelligence; Computer Poker and Imperfect Information; Discovery Informatics; Incentives and Trust in Electronic Communities; Intelligent Cinematography and Editing; Machine Learning for Interactive Systems: Bridging the Gap between Perception, Action and Communication; Modern Artificial Intelligence for Health Analytics; Multiagent Interaction without Prior Coordination; Multidisciplinary Workshop on Advances in Preference Handling; Semantic Cities -- Beyond Open Data to Models, Standards and Reasoning; Sequential Decision Making with Big Data; Statistical Relational AI; and The World Wide Web and Public Health Intelligence. This article presents short summaries of those events.


Reports of the AAAI 2014 Conference Workshops

AI Magazine

The AAAI-14 Workshop program was held Sunday and Monday, July 27–28, 2012, at the Québec City Convention Centre in Québec, Canada. Canada. The AAAI-14 workshop program included fifteen workshops covering a wide range of topics in artificial intelligence. The titles of the workshops were AI and Robotics; Artificial Intelligence Applied to Assistive Technologies and Smart Environments; Cognitive Computing for Augmented Human Intelligence; Computer Poker and Imperfect Information; Discovery Informatics; Incentives and Trust in Electronic Communities; Intelligent Cinematography and Editing; Machine Learning for Interactive Systems: Bridging the Gap between Perception, Action and Communication; Modern Artificial Intelligence for Health Analytics; Multiagent Interaction without Prior Coordination; Multidisciplinary Workshop on Advances in Preference Handling; Semantic Cities — Beyond Open Data to Models, Standards and Reasoning; Sequential Decision Making with Big Data; Statistical Relational AI; and The World Wide Web and Public Health Intelligence. This article presents short summaries of those events.


An Ontology for Ecological Urbanism. SUM+Ecology

AAAI Conferences

As the complexity and abundance of city data increases, reusable semantic models that can integrate heterogeneous data sources in a lightweight manner enable a holistic view of the city data, which is key to Urban Ecology. Our multi-disciplinary team has built an ontology for Urban Ecology that not only captures a field-validated urban model and certification process, but also explores the reuse of semantic models and their interaction with domain experts.