Not enough data to create a plot.
Try a different view from the menu above.
Dickens, Luke
Rule-Guided Feedback: Enhancing Reasoning by Enforcing Rule Adherence in Large Language Models
Diallo, Aissatou, Bikakis, Antonis, Dickens, Luke, Hunter, Anthony, Miller, Rob
In this paper, we introduce Rule-Guided Feedback (RGF), a framework designed to enhance Large Language Model (LLM) performance through structured rule adherence and strategic information seeking. RGF implements a teacher-student paradigm where rule-following is forced through established guidelines. Our framework employs a Teacher model that rigorously evaluates each student output against task-specific rules, providing constructive guidance rather than direct answers when detecting deviations. This iterative feedback loop serves two crucial purposes: maintaining solutions within defined constraints and encouraging proactive information seeking to resolve uncertainties. We evaluate RGF on diverse tasks including Checkmate-in-One puzzles, Sonnet Writing, Penguins-In-a-Table classification, GSM8k, and StrategyQA. Our findings suggest that structured feedback mechanisms can significantly enhance LLMs' performance across various domains.
RESPONSE: Benchmarking the Ability of Language Models to Undertake Commonsense Reasoning in Crisis Situation
Diallo, Aissatou, Bikakis, Antonis, Dickens, Luke, Hunter, Anthony, Miller, Rob
An interesting class of commonsense reasoning problems arises when people are faced with natural disasters. To investigate this topic, we present \textsf{RESPONSE}, a human-curated dataset containing 1789 annotated instances featuring 6037 sets of questions designed to assess LLMs' commonsense reasoning in disaster situations across different time frames. The dataset includes problem descriptions, missing resources, time-sensitive solutions, and their justifications, with a subset validated by environmental engineers. Through both automatic metrics and human evaluation, we compare LLM-generated recommendations against human responses. Our findings show that even state-of-the-art models like GPT-4 achieve only 37\% human-evaluated correctness for immediate response actions, highlighting significant room for improvement in LLMs' ability for commonsense reasoning in crises.
IAO Prompting: Making Knowledge Flow Explicit in LLMs through Structured Reasoning Templates
Diallo, Aissatou, Bikakis, Antonis, Dickens, Luke, Hunter, Anthony, Miller, Rob
While Large Language Models (LLMs) demonstrate impressive reasoning capabilities, understanding and validating their knowledge utilization remains challenging. Chain-of-thought (CoT) prompting partially addresses this by revealing intermediate reasoning steps, but the knowledge flow and application remain implicit. We introduce IAO (Input-Action-Output) prompting, a structured template-based method that explicitly models how LLMs access and apply their knowledge during complex reasoning tasks. IAO decomposes problems into sequential steps, each clearly identifying the input knowledge being used, the action being performed, and the resulting output. This structured decomposition enables us to trace knowledge flow, verify factual consistency, and identify potential knowledge gaps or misapplications. Through experiments across diverse reasoning tasks, we demonstrate that IAO not only improves zero-shot performance but also provides transparency in how LLMs leverage their stored knowledge. Human evaluation confirms that this structured approach enhances our ability to verify knowledge utilization and detect potential hallucinations or reasoning errors. Our findings provide insights into both knowledge representation within LLMs and methods for more reliable knowledge application.
Neural DNF-MT: A Neuro-symbolic Approach for Learning Interpretable and Editable Policies
Baugh, Kexin Gu, Dickens, Luke, Russo, Alessandra
Although deep reinforcement learning has been shown to be effective, the model's black-box nature presents barriers to direct policy interpretation. To address this problem, we propose a neuro-symbolic approach called neural DNF-MT for end-to-end policy learning. The differentiable nature of the neural DNF-MT model enables the use of deep actor-critic algorithms for training. At the same time, its architecture is designed so that trained models can be directly translated into interpretable policies expressed as standard (bivalent or probabilistic) logic programs. Moreover, additional layers can be included to extract abstract features from complex observations, acting as a form of predicate invention. The logic representations are highly interpretable, and we show how the bivalent representations of deterministic policies can be edited and incorporated back into a neural model, facilitating manual intervention and adaptation of learned policies. We evaluate our approach on a range of tasks requiring learning deterministic or stochastic behaviours from various forms of observations. Our empirical results show that our neural DNF-MT model performs at the level of competing black-box methods whilst providing interpretable policies.
Measuring Error Alignment for Decision-Making Systems
Xu, Binxia, Bikakis, Antonis, Onah, Daniel, Vlachidis, Andreas, Dickens, Luke
Given that AI systems are set to play a pivotal role in future decision-making processes, their trustworthiness and reliability are of critical concern. Due to their scale and complexity, modern AI systems resist direct interpretation, and alternative ways are needed to establish trust in those systems, and determine how well they align with human values. We argue that good measures of the information processing similarities between AI and humans, may be able to achieve these same ends. While Representational alignment (RA) approaches measure similarity between the internal states of two systems, the associated data can be expensive and difficult to collect for human systems. In contrast, Behavioural alignment (BA) comparisons are cheaper and easier, but questions remain as to their sensitivity and reliability. We propose two new behavioural alignment metrics misclassification agreement which measures the similarity between the errors of two systems on the same instances, and class-level error similarity which measures the similarity between the error distributions of two systems. We show that our metrics correlate well with RA metrics, and provide complementary information to another BA metric, within a range of domains, and set the scene for a new approach to value alignment.
Unsupervised Learning of Graph from Recipes
Diallo, Aissatou, Bikakis, Antonis, Dickens, Luke, Hunter, Anthony, Miller, Rob
Cooking recipes are one of the most readily available kinds of procedural text. They consist of natural language instructions that can be challenging to interpret. In this paper, we propose a model to identify relevant information from recipes and generate a graph to represent the sequence of actions in the recipe. In contrast with other approaches, we use an unsupervised approach. We iteratively learn the graph structure and the parameters of a $\mathsf{GNN}$ encoding the texts (text-to-graph) one sequence at a time while providing the supervision by decoding the graph into text (graph-to-text) and comparing the generated text to the input. We evaluate the approach by comparing the identified entities with annotated datasets, comparing the difference between the input and output texts, and comparing our generated graphs with those generated by state of the art methods.
PizzaCommonSense: Learning to Model Commonsense Reasoning about Intermediate Steps in Cooking Recipes
Diallo, Aissatou, Bikakis, Antonis, Dickens, Luke, Hunter, Anthony, Miller, Rob
Decoding the core of procedural texts, exemplified by cooking recipes, is crucial for intelligent reasoning and instruction automation. Procedural texts can be comprehensively defined as a sequential chain of steps to accomplish a task employing resources. From a cooking perspective, these instructions can be interpreted as a series of modifications to a food preparation, which initially comprises a set of ingredients. These changes involve transformations of comestible resources. For a model to effectively reason about cooking recipes, it must accurately discern and understand the inputs Figure 1: A graphical depiction of the PizzaCommonsense and outputs of intermediate steps within the underlying motivation. Models are required to recipe. Aiming to address this, we present a learn knowledge about the input and output of each intermediate new corpus of cooking recipes enriched with step and predict the correct sequencing of descriptions of intermediate steps of the recipes these comestibles given the corresponding instructions that explicate the input and output for each step.
A Graphical Formalism for Commonsense Reasoning with Recipes
Bikakis, Antonis, Diallo, Aissatou, Dickens, Luke, Hunter, Anthony, Miller, Rob
To used for actions and comestibles; Section 3 presents a address this shortcoming, we propose a high-level representation representation of recipes as bipartite graphs; Section 4 considers of recipes as labelled bipartite graphs where the acceptability of recipes; Section 5 presents definitions first subset of nodes denotes the comestibles involved in the for comparing recipes; Section 6 presents definitions for recipe (ingredients, intermediate food items, final products, composition of recipes from subrecipes; Section 7 presents i.e. dishes, and by-products) and the second subset of nodes substitution based on changing the type of nodes; Section 8 denotes actions on those comestibles. The edges reflect the presents substitution based on changing the structure of the (possibly partial) sequence of steps taken in the recipe going graph; Section 9 discusses related literature; and Section 10 from the ingredients to final products.
Repurposing of Resources: from Everyday Problem Solving through to Crisis Management
Bikakis, Antonis, Dickens, Luke, Hunter, Anthony, Miller, Rob
The human ability to repurpose objects and processes is universal, but it is not a well-understood aspect of human intelligence. Repurposing arises in everyday situations such as finding substitutes for missing ingredients when cooking, or for unavailable tools when doing DIY. It also arises in critical, unprecedented situations needing crisis management. After natural disasters and during wartime, people must repurpose the materials and processes available to make shelter, distribute food, etc. Repurposing is equally important in professional life (e.g. clinicians often repurpose medicines off-license) and in addressing societal challenges (e.g. finding new roles for waste products,). Despite the importance of repurposing, the topic has received little academic attention. By considering examples from a variety of domains such as every-day activities, drug repurposing and natural disasters, we identify some principle characteristics of the process and describe some technical challenges that would be involved in modelling and simulating it. We consider cases of both substitution, i.e. finding an alternative for a missing resource, and exploitation, i.e. identifying a new role for an existing resource. We argue that these ideas could be developed into general formal theory of repurposing, and that this could then lead to the development of AI methods based on commonsense reasoning, argumentation, ontological reasoning, and various machine learning methods, to develop tools to support repurposing in practice.
On the Transferability of VAE Embeddings using Relational Knowledge with Semi-Supervision
Strömfelt, Harald, Dickens, Luke, Garcez, Artur d'Avila, Russo, Alessandra
When dealing with complex data, the effectiveness of a classifier/predictor is limited by its ability to extract useful information. As such, representations that clearly expose the semantics of the data should then be most amenable to downstream learning [1, 2]. This is often referred to as a challenge of acquiring a disentangled representation over the factors of the data [3]. A popular recent trend that has had significant success in this regard uses semi-supervised Variational AutoEncoders (VAE) [4, 5, 6, 7, 8, 9]. Whilst fully unsupervised VAE methods have been shown to require strong inductive bias [10], semi-supervised methods achieve disentanglement by training additional auxiliary tasks that are defined on the factors, alongside the standard VAE objective (see Appendix Eqn. 3).