Goto

Collaborating Authors

 carol


The Obliging Apocalypse of "Pluribus"

The New Yorker

The new sci-fi drama from Vince Gilligan posits an end-of-humanity scenario that everyone other than its protagonist can agree on. Even before her fellow-humans' contamination, Carol didn't seem to have much use for them. On the night that the world as we know it is destroyed, a novelist named Carol Sturka (played by Rhea Seehorn) sees cars and planes veer off course, an emergency room full of convulsing bodies, and her city, Albuquerque, on fire. The President dies under mysterious circumstances, and, more devastatingly for Carol, so does her live-in partner, Helen (Miriam Shor). Then, in less than an hour, the apocalypse cleans up after itself.


CARoL: Context-aware Adaptation for Robot Learning

Hu, Zechen, Xu, Tong, Xiao, Xuesu, Wang, Xuan

arXiv.org Artificial Intelligence

--Using Reinforcement Learning (RL) to learn new robotic tasks from scratch is often inefficient. Leveraging prior knowledge has the potential to significantly enhance learning efficiency, which, however, raises two critical challenges: how to determine the relevancy of existing knowledge and how to adap-tively integrate them into learning a new task. CARoL incorporates context awareness by analyzing state transitions in system dynamics to identify similarities between the new task and prior knowledge. It then utilizes these identified similarities to prioritize and adapt specific knowledge pieces for the new task. Additionally, CARoL has a broad applicability spanning policy-based, value-based, and actor-critic RL algorithms. The simulations include CarRacing and LunarLander environments, where CARoL demonstrates faster convergence and higher rewards when learning policies for new tasks. In real-world experiments, we show that CARoL enables a ground vehicle to quickly and efficiently adapt policies learned in simulation to smoothly traverse real-world off-road terrain. In recent years, Reinforcement Learning (RL) approaches have achieved remarkable success in advanced robotic control and complex task learning in dynamic environments, enabling applications across various domains, such as autonomous navigation [36, 38], manipulation [28, 42], and human-robot interaction [23]. Despite these advancements, RL methods are typically computationally demanding, as they rely on repeated trial-and-error exploration to discover high-reward outcomes. Knowledge fusion [2] and adaptation [24, 35] provide promising approaches to address the inefficiency of RL. They leverage knowledge (such as a learned control policy, approximated value function, etc.) from previously explored tasks to accelerate training on new tasks, eliminating the need to train from scratch for every scenario. For example, consider a vehicle navigating highly complex off-road terrain as shown in Figure 1. Suppose the vehicle has undergone extensive training in several existing environments, it should ideally be capable of adapting to a new type of terrain by utilizing previously learned knowledge.


Task Memory Engine: Spatial Memory for Robust Multi-Step LLM Agents

Ye, Ye

arXiv.org Artificial Intelligence

Large Language Models (LLMs) falter in multi-step interactions -- often hallucinating, repeating actions, or misinterpreting user corrections -- due to reliance on linear, unstructured context. This fragility stems from the lack of persistent memory to track evolving goals and task dependencies, undermining trust in autonomous agents. We introduce the Task Memory Engine (TME), a modular memory controller that transforms existing LLMs into robust, revision-aware agents without fine-tuning. TME implements a spatial memory framework that replaces flat context with graph-based structures to support consistent, multi-turn reasoning. Departing from linear concatenation and ReAct-style prompting, TME builds a dynamic task graph -- either a tree or directed acyclic graph (DAG) -- to map user inputs to subtasks, align them with prior context, and enable dependency-tracked revisions. Its Task Representation and Intent Management (TRIM) component models task semantics and user intent to ensure accurate interpretation. Across four multi-turn scenarios-trip planning, cooking, meeting scheduling, and shopping cart editing -- TME eliminates 100% of hallucinations and misinterpretations in three tasks, and reduces hallucinations by 66.7% and misinterpretations by 83.3% across 27 user turns, outperforming ReAct. TME's modular design supports plug-and-play deployment and domain-specific customization, adaptable to both personal assistants and enterprise automation. We release TME's codebase, benchmarks, and components as open-source resources, enabling researchers to develop reliable LLM agents. TME's scalable architecture addresses a critical gap in agent performance across complex, interactive settings.


The arrogant ex-soldier who turned into a triple killer

BBC News

Former soldier Kyle Clifford raped and murdered Louise Hunt, and killed her sister Hannah and mother Carol in attacks described by police as "barbaric". What happened and what has emerged since? Days before the attacks, Louise had ended an 18-month relationship with Clifford. She told Clifford, who she had met through a dating app, it was "sucking the life out of me". They did not like the way Clifford treated Louise, finding him disrespectful, arrogant, rude and "odd". He had hidden relationships with other women from Louise, and went on a dating site moments after receiving the message ending theirs.


Town Hall Debate Prompting: Enhancing Logical Reasoning in LLMs through Multi-Persona Interaction

Sandwar, Vivaan, Jain, Bhav, Thangaraj, Rishan, Garg, Ishaan, Lam, Michael, Zhu, Kevin

arXiv.org Artificial Intelligence

Debate is a commonly used form of human communication catered towards problem-solving because of its efficiency. Debate fundamentally allows multiple viewpoints to be brought up in problem-solving, and for complex problems, each viewpoint opens a new path for problem-solving. In this work, we apply this concept to LLM decision-making by proposing town hall-style debate prompting (THDP), a prompting method that splices a language model into multiple personas that will debate one another to reach a conclusion. Our experimental pipeline varies both the number of personas and the personality types of each persona to find the optimum town hall size and personality for benchmark performance as measured by ZebraLogic bench, a reasoning-intensive benchmark characterized by both multiple-choice and fill-in-the-blank questions. Our experimental results demonstrate that a town hall size of 5 personas with LLM-determined personality types performs optimally on ZebraLogic, achieving a 13\% improvement over one-shot CoT baselines in per-cell accuracy in GPT-4o, 9% puzzle accuracy increase in Claude 3.5 Sonnet, and an improvement in hard puzzle accuracy from 10-15%.


Class-Aware Contrastive Optimization for Imbalanced Text Classification

Khvatskii, Grigorii, Moniz, Nuno, Doan, Khoa, Chawla, Nitesh V

arXiv.org Artificial Intelligence

The unique characteristics of text data make classification tasks a complex problem. Advances in unsupervised and semi-supervised learning and autoencoder architectures addressed several challenges. However, they still struggle with imbalanced text classification tasks, a common scenario in real-world applications, demonstrating a tendency to produce embeddings with unfavorable properties, such as class overlap. In this paper, we show that leveraging class-aware contrastive optimization combined with denoising autoencoders can successfully tackle imbalanced text classification tasks, achieving better performance than the current state-of-the-art. Concretely, our proposal combines reconstruction loss with contrastive class separation in the embedding space, allowing a better balance between the truthfulness of the generated embeddings and the model's ability to separate different classes. Compared with an extensive set of traditional and state-of-the-art competing methods, our proposal demonstrates a notable increase in performance across a wide variety of text datasets.


The Quest to Build a Telescope on the Moon

The New Yorker

A few months ago, I flew to Houston to visit a small startup called Lunar Resources, which aspires to build the largest telescope in the solar system--not on Earth but on the far side of the moon. Houston is nicknamed Space City; on the ride from the airport, I passed the ballpark where the Astros play, and, outside a McDonald's on East NASA Parkway, I saw a giant sculpture of an astronaut holding French fries. I found Lunar Resources in a boxy building where the company leases square footage from the aerospace contractor Lockheed Martin. Elliot Carol, the C.E.O. and co-founder of Lunar Resources, is thirty-three, with a cherubic face and curly hair speckled with gray. Although he grew up in Connecticut and previously worked as a hedge-fund manager, he was wearing black cowboy boots.


First Heuristic Then Rational: Dynamic Use of Heuristics in Language Model Reasoning

Aoki, Yoichi, Kudo, Keito, Kuribayashi, Tatsuki, Sone, Shusaku, Taniguchi, Masaya, Sakaguchi, Keisuke, Inui, Kentaro

arXiv.org Artificial Intelligence

Multi-step reasoning is widely adopted in the community to explore the better performance of language models (LMs). We report on the systematic strategy that LMs use in this process. Our controlled experiments reveal that LMs rely more heavily on heuristics, such as lexical overlap, in the earlier stages of reasoning Figure 1: Illustration of the systematic strategy we discovered when more steps are required to reach an in language models (LMs). When the goal is answer. Conversely, as LMs progress closer distant from the current state in a multi-step reasoning to the final answer, their reliance on heuristics process, the models tend to rely on heuristics, such as decreases. This suggests that LMs track only superficial overlap, which can lead them in the wrong a limited number of future steps and dynamically direction. In contrast, when the goal is within a limited combine heuristic strategies with logical distance, the models are more likely to take rational actions ones in tasks involving multi-step reasoning.


Bayesian Inverse Contextual Reasoning for Heterogeneous Semantics-Native Communication

Seo, Hyowoon, Kang, Yoonseong, Bennis, Mehdi, Choi, Wan

arXiv.org Artificial Intelligence

This work deals with the heterogeneous semantic-native communication (SNC) problem. When agents do not share the same communication context, the effectiveness of contextual reasoning (CR) is compromised calling for agents to infer other agents' context. This article proposes a novel framework for solving the inverse problem of CR in SNC using two Bayesian inference methods, namely: Bayesian inverse CR (iCR) and Bayesian inverse linearized CR (iLCR). The first proposed Bayesian iCR method utilizes Markov Chain Monte Carlo (MCMC) sampling to infer the agent's context while being computationally expensive. To address this issue, a Bayesian iLCR method is leveraged which obtains a linearized CR (LCR) model by training a linear neural network. Experimental results show that the Bayesian iLCR method requires less computation and achieves higher inference accuracy compared to Bayesian iCR. Additionally, heterogeneous SNC based on the context obtained through the Bayesian iLCR method shows better communication effectiveness than that of Bayesian iCR. Overall, this work provides valuable insights and methods to improve the effectiveness of SNC in situations where agents have different contexts.


AI-Written Books: Can Artificial Intelligence Write a Novel?

#artificialintelligence

AI-written books are now an incoming reality. But can they write the next great bestseller? AI is already writing music, creating pictures for graphic novels, and winning art competitions, beating humans. One of the first experimental AI-written novels turned up as early as 2017. Called 1 the Road, it was an experiment by Ross Goodwin.