liam
DEL-ToM: Inference-Time Scaling for Theory-of-Mind Reasoning via Dynamic Epistemic Logic
Wu, Yuheng, Xie, Jianwen, Zhang, Denghui, Xu, Zhaozhuo
Theory-of-Mind (ToM) tasks pose a unique challenge for large language models (LLMs), which often lack the capability for dynamic logical reasoning. In this work, we propose DEL-ToM, a framework that improves verifiable ToM reasoning through inference-time scaling rather than architectural changes. Our approach decomposes ToM tasks into a sequence of belief updates grounded in Dynamic Epistemic Logic (DEL), enabling structured and verifiable dynamic logical reasoning. We use data generated automatically via a DEL simulator to train a verifier, which we call the Process Belief Model (PBM), to score each belief update step. During inference, the PBM evaluates candidate belief traces from the LLM and selects the highest-scoring one. This allows LLMs to allocate extra inference-time compute to yield more transparent reasoning. Experiments across model scales and benchmarks show that DEL-ToM consistently improves performance, demonstrating that verifiable belief supervision significantly enhances LLMs' ToM capabilities without retraining. Code is available at https://github.com/joel-wu/DEL-ToM.
- Asia > Middle East > Jordan (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
a03caec56cd82478bf197475b48c05f9-Supplemental.pdf
Algorithm 1 shows the pseudocode of LIAM.Algorithm 1 Pseudocode of LIAM's algorithmfor m = 1,...,M episodes do Reset the hidden state of the encoder LSTM Sample E fixed policies from Π Create E parallel environments and gather initial observations a The fixed policies in the predator-prey consist of a combination of heuristic and pretrained policies. First we created four heuristic policies, which are: (i) going after the prey, (ii) going after one of the predators, (iii) going after the agent (predator or prey) that is closest, (iv) going after the predator that is closest. CARL has access to the trajectories of all the other agents in the environment during training, but during execution only to the local trajectory. To extract such representations, we use self-supervised learning based on recent advances on contrastive learning [Oord et al., 2018, He et al., 2020, Chen et al., 2020a,b]. During training and given a batch of episode trajectories we construct the positive and negative pairs following Equation (4) and minimise the InfoNCE loss [Oord et al., 2018] Following the work of Chung et al. [2015] we can write the lower bound in the log-evidence of the We train LIAM-V AE similarly to LIAM.
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.97)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)
Exploring Next Token Prediction in Theory of Mind (ToM) Tasks: Comparative Experiments with GPT-2 and LLaMA-2 AI Models
Yadav, Pavan, Khandalkar, Nikhil, Shinde, Krishna, Ramegowda, Lokesh B., Das, Rajarshi
Language models have made significant progress in generating coherent text and predicting next tokens based on input prompts. This study compares the next-token prediction performance of two well-known models: OpenAI's GPT-2 and Meta's Llama-2-7b-chat-hf on Theory of Mind (ToM) tasks. To evaluate their capabilities, we built a dataset from 10 short stories sourced from the Explore ToM Dataset. We enhanced these stories by programmatically inserting additional sentences (infills) using GPT-4, creating variations that introduce different levels of contextual complexity. This setup enables analysis of how increasing context affects model performance. We tested both models under four temperature settings (0.01, 0.5, 1.0, 2.0) and evaluated their ability to predict the next token across three reasoning levels. Zero-order reasoning involves tracking the state, either current (ground truth) or past (memory). First-order reasoning concerns understanding another's mental state (e.g., "Does Anne know the apple is salted?"). Second-order reasoning adds recursion (e.g., "Does Anne think that Charles knows the apple is salted?"). Our results show that adding more infill sentences slightly reduces prediction accuracy, as added context increases complexity and ambiguity. Llama-2 consistently outperforms GPT-2 in prediction accuracy, especially at lower temperatures, demonstrating greater confidence in selecting the most probable token. As reasoning complexity rises, model responses diverge more. Notably, GPT-2 and Llama-2 display greater variability in predictions during first- and second-order reasoning tasks. These findings illustrate how model architecture, temperature, and contextual complexity influence next-token prediction, contributing to a better understanding of the strengths and limitations of current language models.
- North America > United States (0.14)
- Asia > India > West Bengal > Kolkata (0.05)
- Asia > India > Karnataka > Bengaluru (0.04)
LIAM: Multimodal Transformer for Language Instructions, Images, Actions and Semantic Maps
Wang, Yihao, Memmesheimer, Raphael, Behnke, Sven
The availability of large language models and open-vocabulary object perception methods enables more flexibility for domestic service robots. The large variability of domestic tasks can be addressed without implementing each task individually by providing the robot with a task description along with appropriate environment information. In this work, we propose LIAM -- an end-to-end model that predicts action transcripts based on language, image, action, and map inputs. Language and image inputs are encoded with a CLIP backbone, for which we designed two pre-training tasks to fine-tune its weights and pre-align the latent spaces. We evaluate our method on the ALFRED dataset, a simulator-generated benchmark for domestic tasks. Our results demonstrate the importance of pre-aligning embedding spaces from different modalities and the efficacy of incorporating semantic maps.
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.50)
'We got bored waiting for Oasis to re-form': AIsis, the band fronted by an AI Liam Gallagher
Before you do anything else with your day, you need to listen to this. A new "lost" Oasis album has been released, from the period between their third album, 1997's Be Here Now, and their fourth, 2000's Standing on the Shoulder of Giants. It was created by AI – or at least, it's an AI Liam Gallagher doing its best "hellooooos" and "sun-shiiiines" over a real band. But the eight songs, including Out of My Mind, Coming of Age and Forever, are practically indistinguishable from the real thing, with some seriously catchy melodies that give every post-What's the Story album – not to mention the whole of Liam and Noel's solo catalogues – a run for their money. How do you get a computer to sing like Liam?
- Media > Music (1.00)
- Leisure & Entertainment (1.00)
Data validation in Python: a look into Pandera and Great Expectations
Liam studied an MSci in Physics at University College London, which included modules on Statistical Data Analysis, High Performance Computing, Practical Physics and Computing. This led to his dissertation exploring the use of machine learning techniques for analysing LHC particle collision data. Before joining endjin, Liam had a keen interest in data science and engineering, and did a number of related internships. However, since joining endjin he has developed a much broader set of interest, including DevOps and more general software engineering. He is currently exploring those interests and finding his feet in the tech space.
How The New AI ChatGPT Can Help Leaders Make Time To Be Human
In the last week, I've been experimenting with the hot new version of ChatGPT to discover how it might conserve a leader's scarcest resource: time. When OpenAI launched the AI chatbot at the end of November, it instantly attracted millions of users, with breathless predictions of its potential to disrupt business models and jobs. It certainly promises to deliver on a prediction I made in 2019 in my book The Human Edge, which explores the skills needed in a world of artificial intelligence and digitization. I forecasted: "…AI can offer us more free time by automating the stupid stuff we currently have to do, thereby reducing our cognitive burden." The prize is clear for leaders.