Goto

Collaborating Authors

 Licato, John


Giving AI Personalities Leads to More Human-Like Reasoning

arXiv.org Artificial Intelligence

In computational cognitive modeling, capturing the full spectrum of human judgment and decision-making processes, beyond just optimal behaviors, is a significant challenge. This study explores whether Large Language Models (LLMs) can emulate the breadth of human reasoning by predicting both intuitive, fast System 1 and deliberate, slow System 2 processes. We investigate the potential of AI to mimic diverse reasoning behaviors across a human population, addressing what we call the "full reasoning spectrum problem". We designed reasoning tasks using a novel generalization of the Natural Language Inference (NLI) format to evaluate LLMs' ability to replicate human reasoning. The questions were crafted to elicit both System 1 and System 2 responses. Human responses were collected through crowd-sourcing and the entire distribution was modeled, rather than just the majority of the answers. We used personality-based prompting inspired by the Big Five personality model to elicit AI responses reflecting specific personality traits, capturing the diversity of human reasoning, and exploring how personality traits influence LLM outputs. Combined with genetic algorithms to optimize the weighting of these prompts, this method was tested alongside traditional machine learning models. The results show that LLMs can mimic human response distributions, with open-source models like Llama and Mistral outperforming proprietary GPT models. Personality-based prompting, especially when optimized with genetic algorithms, significantly enhanced LLMs' ability to predict human response distributions, suggesting that capturing suboptimal, naturalistic reasoning may require modeling techniques incorporating diverse reasoning styles and psychological profiles. The study concludes that personality-based prompting combined with genetic algorithms is promising for enhancing AI's 'human-ness' in reasoning.


No Strong Feelings One Way or Another: Re-operationalizing Neutrality in Natural Language Inference

arXiv.org Artificial Intelligence

Natural Language Inference (NLI) has been a cornerstone task in evaluating language models' inferential reasoning capabilities. However, the standard three-way classification scheme used in NLI has well-known shortcomings in evaluating models' ability to capture the nuances of natural human reasoning. In this paper, we argue that the operationalization of the neutral label in current NLI datasets has low validity, is interpreted inconsistently, and that at least one important sense of neutrality is often ignored. We uncover the detrimental impact of these shortcomings, which in some cases leads to annotation datasets that actually decrease performance on downstream tasks. We compare approaches of handling annotator disagreement and identify flaws in a recent NLI dataset that designs an annotator study based on a problematic operationalization. Our findings highlight the need for a more refined evaluation framework for NLI, and we hope to spark further discussion and action in the NLP community.


Resolving Open-textured Rules with Templated Interpretive Arguments

arXiv.org Artificial Intelligence

Open-textured terms in written rules are typically settled through interpretive argumentation. Ongoing work has attempted to catalogue the schemes used in such interpretive argumentation. But how can the use of these schemes affect the way in which people actually use and reason over the proper interpretations of open-textured terms? Using the interpretive argument-eliciting game Aporia as our framework, we carried out an empirical study to answer this question. Differing from previous work, we did not allow participants to argue for interpretations arbitrarily, but to only use arguments that fit with a given set of interpretive argument templates. Finally, we analyze the results captured by this new dataset, specifically focusing on practical implications for the development of interpretation-capable artificial reasoners.


How Should AI Interpret Rules? A Defense of Minimally Defeasible Interpretive Argumentation

arXiv.org Artificial Intelligence

Can artificially intelligent systems follow rules? The answer might seem an obvious `yes', in the sense that all (current) AI strictly acts in accordance with programming code constructed from highly formalized and well-defined rulesets. But here I refer to the kinds of rules expressed in human language that are the basis of laws, regulations, codes of conduct, ethical guidelines, and so on. The ability to follow such rules, and to reason about them, is not nearly as clear-cut as it seems on first analysis. Real-world rules are unavoidably rife with open-textured terms, which imbue rules with a possibly infinite set of possible interpretations. Narrowing down this set requires a complex reasoning process that is not yet within the scope of contemporary AI. This poses a serious problem for autonomous AI: If one cannot reason about open-textured terms, then one cannot reason about (or in accordance with) real-world rules. And if one cannot reason about real-world rules, then one cannot: follow human laws, comply with regulations, act in accordance with written agreements, or even obey mission-specific commands that are anything more than trivial. But before tackling these problems, we must first answer a more fundamental question: Given an open-textured rule, what is its correct interpretation? Or more precisely: How should our artificially intelligent systems determine which interpretation to consider correct? In this essay, I defend the following answer: Rule-following AI should act in accordance with the interpretation best supported by minimally defeasible interpretive arguments (MDIA).


Improving Paraphrase Detection with the Adversarial Paraphrasing Task

arXiv.org Artificial Intelligence

If two sentences have the same meaning, it should follow that they are equivalent in their inferential properties, i.e., each sentence should textually entail the other. However, many paraphrase datasets currently in widespread use rely on a sense of paraphrase based on word overlap and syntax. Can we teach them instead to identify paraphrases in a way that draws on the inferential properties of the sentences, and is not over-reliant on lexical and syntactic similarities of a sentence pair? We apply the adversarial paradigm to this question, and introduce a new adversarial method of dataset creation for paraphrase identification: the Adversarial Paraphrasing Task (APT), which asks participants to generate semantically equivalent (in the sense of mutually implicative) but lexically and syntactically disparate paraphrases. These sentence pairs can then be used both to test paraphrase identification models (which get barely random accuracy) and then improve their performance. To accelerate dataset generation, we explore automation of APT using T5, and show that the resulting dataset also improves accuracy. We discuss implications for paraphrase detection and release our dataset in the hope of making paraphrase detection models better able to detect sentence-level meaning equivalence.


Probing the Natural Language Inference Task with Automated Reasoning Tools

AAAI Conferences

The Natural Language Inference (NLI) task is an important task in modern NLP, as it asks a broad question to which many other tasks may be reducible: Given a pair of sentences, does the first entail the second? Although the state-of-the-art on current benchmark datasets for NLI are deep learning-based, it is worthwhile to use other techniques to examine the logical structure of the NLI task. We do so by testing how well a logically-controlled natural language (Attempto Controlled English) can be used to parse NLI sentences, and how well automated theorem provers can reason over the resulting formulae. To improve performance, we develop a set of syntactic and semantic transformation rules. We report their performance, and discuss implications for NLI and logic-based NLP.


Towards Concise, Machine-Discovered Proofs of Gรถdel's Two Incompleteness Theorems

AAAI Conferences

There is an increasing interest in applying recent advances in AI to automated reasoning, as it may provide useful heuristics in reasoning over formalisms in first-order, second-order, or even meta-logics. To facilitate this research, we present MATR, a new framework for automated theorem proving explicitly designed to easily adapt to unusual logics or integrate new reasoning processes. MATR is formalism-agnostic, highly modular, and programmer-friendly. We explain the high-level design of MATR as well as some details of its implementation. To demonstrate MATR's utility, we then describe a formalized metalogic suitable for proofs of Gรถdel's Incompleteness Theorems, and report on our progress using our metalogic in MATR to semi-autonomously generate proofs of both the First and Second Incompleteness Theorems.


Towards Concise, Machine-discovered Proofs of G\"odel's Two Incompleteness Theorems

arXiv.org Artificial Intelligence

There is an increasing interest in applying recent advances in AI to automated reasoning, as it may provide useful heuristics in reasoning over formalisms in first-order, second-order, or even meta-logics. To facilitate this research, we present MATR, a new framework for automated theorem proving explicitly designed to easily adapt to unusual logics or integrate new reasoning processes. MATR is formalism-agnostic, highly modular, and programmer-friendly. We explain the high-level design of MATR as well as some details of its implementation. To demonstrate MATR's utility, we then describe a formalized metalogic suitable for proofs of G\"odel's Incompleteness Theorems, and report on our progress using our metalogic in MATR to semi-autonomously generate proofs of both the First and Second Incompleteness Theorems.


Probing the Natural Language Inference Task with Automated Reasoning Tools

arXiv.org Artificial Intelligence

The Natural Language Inference (NLI) task is an important task in modern NLP, as it asks a broad question to which many other tasks may be reducible: Given a pair of sentences, does the first entail the second? Although the state-of-the-art on current benchmark datasets for NLI are deep learning-based, it is worthwhile to use other techniques to examine the logical structure of the NLI task. We do so by testing how well a machine-oriented controlled natural language (Attempto Controlled English) can be used to parse NLI sentences, and how well automated theorem provers can reason over the resulting formulae. To improve performance, we develop a set of syntactic and semantic transformation rules. We report their performance, and discuss implications for NLI and logic-based NLP.


Developing a Dataset for Personal Attacks and Other Indicators of Biases

AAAI Conferences

Online argumentation, particularly on popular public discussion boards and social media, is rich with fallacy-and bias-prone arguments. An artificially intelligent tool capable of identifying potential biases in online argumentation might be able to address this growing problem, but what would it take to develop such a tool? In this paper, we attempt to answer this question by carefully defining both argumentative biases and fallacies, and laying out some guidelines for automated bias detection. After laying out a roadmap and identifying current bottlenecks, we take some initial steps towards relieving these limitations through the creation of a dataset of personal and ad hominem attacks in comments. Our progress in this direction is summarized.