Goto

Collaborating Authors

 jamal


IAO Prompting: Making Knowledge Flow Explicit in LLMs through Structured Reasoning Templates

Diallo, Aissatou, Bikakis, Antonis, Dickens, Luke, Hunter, Anthony, Miller, Rob

arXiv.org Artificial Intelligence

While Large Language Models (LLMs) demonstrate impressive reasoning capabilities, understanding and validating their knowledge utilization remains challenging. Chain-of-thought (CoT) prompting partially addresses this by revealing intermediate reasoning steps, but the knowledge flow and application remain implicit. We introduce IAO (Input-Action-Output) prompting, a structured template-based method that explicitly models how LLMs access and apply their knowledge during complex reasoning tasks. IAO decomposes problems into sequential steps, each clearly identifying the input knowledge being used, the action being performed, and the resulting output. This structured decomposition enables us to trace knowledge flow, verify factual consistency, and identify potential knowledge gaps or misapplications. Through experiments across diverse reasoning tasks, we demonstrate that IAO not only improves zero-shot performance but also provides transparency in how LLMs leverage their stored knowledge. Human evaluation confirms that this structured approach enhances our ability to verify knowledge utilization and detect potential hallucinations or reasoning errors. Our findings provide insights into both knowledge representation within LLMs and methods for more reliable knowledge application.


Table as Thought: Exploring Structured Thoughts in LLM Reasoning

Sun, Zhenjie, Deng, Naihao, Yu, Haofei, You, Jiaxuan

arXiv.org Artificial Intelligence

Large language models' reasoning abilities benefit from methods that organize their thought processes, such as chain-of-thought prompting, which employs a sequential structure to guide the reasoning process step-by-step. However, existing approaches focus primarily on organizing the sequence of thoughts, leaving structure in individual thought steps underexplored. To address this gap, we propose Table as Thought, a framework inspired by cognitive neuroscience theories on human thought. Table as Thought organizes reasoning within a tabular schema, where rows represent sequential thought steps and columns capture critical constraints and contextual information to enhance reasoning. The reasoning process iteratively populates the table until self-verification ensures completeness and correctness. Our experiments show that Table as Thought excels in planning tasks and demonstrates a strong potential for enhancing LLM performance in mathematical reasoning compared to unstructured thought baselines. This work provides a novel exploration of refining thought representation within LLMs, paving the way for advancements in reasoning and AI cognition.


Query-Dependent Prompt Evaluation and Optimization with Offline Inverse RL

Sun, Hao, Hüyük, Alihan, van der Schaar, Mihaela

arXiv.org Artificial Intelligence

In this study, we aim to enhance the arithmetic reasoning ability of Large Language Models (LLMs) through zero-shot prompt optimization. We identify a previously overlooked objective of query dependency in such optimization and elucidate two ensuing challenges that impede the successful and economical design of prompt optimization techniques. One primary issue is the absence of an effective method to evaluate prompts during inference when the golden answer is unavailable. Concurrently, learning via interactions with the LLMs to navigate the expansive natural language prompting space proves to be resource-intensive. To address this, we introduce Prompt-OIRL, which harnesses offline inverse reinforcement learning to draw insights from offline prompting demonstration data. Such data exists as by-products when diverse prompts are benchmarked on open-accessible datasets. With Prompt-OIRL, the query-dependent prompt optimization objective is achieved by first learning an offline reward model. This model can evaluate any query-prompt pairs without accessing LLMs. Subsequently, a best-of-N strategy is deployed to recommend the optimal prompt. Our experimental evaluations across various LLM scales and arithmetic reasoning datasets underscore both the efficacy and economic viability of the proposed approach.


Afghans say they know little about US killing of al-Qaeda leader

Al Jazeera

Kabul, Afghanistan – The news of the killing of al-Qaeda chief Ayman al-Zawahiri slowly made its way through the Afghan capital. For many Afghans, it came as a complete surprise. The announcement by the United States of a "precision" drone attack that killed the elusive 71-year-old al-Qaeda leader came in Kabul in the early hours of Tuesday. As the day advanced, more details started to trickle in. However, in a sign of the growing fears over the freedom of speech under a Taliban government, many city residents seemed hesitant to talk about the killing of al-Zawahiri, who had a reward of $25m on his head for the 9/11 attacks.


Stop Doing Fragile Research

@machinelearnbot

Here's a story familiar to anyone who does research in data science or machine learning: (1) you have a brand-new idea for a method to analyze data (2) you want to test it, so you start by generating a random dataset or finding a dataset online.(3) You apply your method to the data, but the results are unimpressive. And you introduce a hyperparameter into your method so that you can fine-tune it, until (5) the method eventually starts producing gorgeous results. However, in taking these steps, you have developed a fragile method, one that is sensitive to the choice of dataset and customized hyperparameters. Rather than developing a more general and robust method, you have made the problem easier.


Stop Doing Fragile Research

@machinelearnbot

Here's a story familiar to anyone who does research in data science or machine learning: (1) you have a brand-new idea for a method to analyze data (2) you want to test it, so you start by generating a random dataset or finding a dataset online.(3) You apply your method to the data, but the results are unimpressive. And you introduce a hyperparameter into your method so that you can fine-tune it, until (5) the method eventually starts producing gorgeous results. However, in taking these steps, you have developed a fragile method, one that is sensitive to the choice of dataset and customized hyperparameters. Rather than developing a more generaland robust method, you have made the problem easier.


It's Our Fault That AI Thinks White Names Are More 'Pleasant' Than Black Names

#artificialintelligence

We all know that hiring managers can be racist when choosing the "right" (read: white) candidate for a job, but what about computers? If you have a name like Ebony or Jamal at the top of your resume, new research suggests that some algorithms will make less "pleasant" associations with your moniker than if you are named Emily or Matt. Machines are increasingly being used to make all kinds of important decisions, like who gets what kind of health insurance coverage, or which convicts are most likely to reoffend. The idea here is that computers, unlike people, can't be racist, but we're increasingly learning that they do in fact take after their makers. As just one example, ProPublica reported in May that an algorithm used by officials in Florida systematically rated white offenders as being lower risk of committing a future crime than blacks.