Goto

Collaborating Authors

 memento


Learning Correlated Reward Models: Statistical Barriers and Opportunities

arXiv.org Machine Learning

Random Utility Models (RUMs) are a classical framework for modeling user preferences and play a key role in reward modeling for Reinforcement Learning from Human Feedback (RLHF). However, a crucial shortcoming of many of these techniques is the Independence of Irrelevant Alternatives (IIA) assumption, which collapses \emph{all} human preferences to a universal underlying utility function, yielding a coarse approximation of the range of human preferences. On the other hand, statistical and computational guarantees for models avoiding this assumption are scarce. In this paper, we investigate the statistical and computational challenges of learning a \emph{correlated} probit model, a fundamental RUM that avoids the IIA assumption. First, we establish that the classical data collection paradigm of pairwise preference data is \emph{fundamentally insufficient} to learn correlational information, explaining the lack of statistical and computational guarantees in this setting. Next, we demonstrate that \emph{best-of-three} preference data provably overcomes these shortcomings, and devise a statistically and computationally efficient estimator with near-optimal performance. These results highlight the benefits of higher-order preference data in learning correlated utilities, allowing for more fine-grained modeling of human preferences. Finally, we validate these theoretical guarantees on several real-world datasets, demonstrating improved personalization of human preferences.


Memento: Fine-tuning LLM Agents without Fine-tuning LLMs

arXiv.org Artificial Intelligence

In this paper, we introduce a novel learning paradigm for Adaptive Large Language Model (LLM) agents that eliminates the need for fine-tuning the underlying LLMs. Existing approaches are often either rigid, relying on static, handcrafted reflection workflows, or computationally intensive, requiring gradient updates of LLM model parameters. In contrast, our method enables low-cost continual adaptation via memory-based online reinforcement learning. We formalise this as a Memory-augmented Markov Decision Process (M-MDP), equipped with a neural case-selection policy to guide action decisions. Past experiences are stored in an episodic memory, either differentiable or non-parametric. The policy is continually updated based on environmental feedback through a memory rewriting mechanism, whereas policy improvement is achieved through efficient memory reading (retrieval). We instantiate our agent model in the deep research setting, namely \emph{Memento}, which attains top-1 on GAIA validation ($87.88\%$ Pass@$3$) and $79.40\%$ on the test set. It reaches $66.6\%$ F1 and $80.4\%$ PM on the DeepResearcher dataset, outperforming the state-of-the-art training-based method, while case-based memory adds $4.7\%$ to $9.6\%$ absolute points on out-of-distribution tasks. Our approach offers a scalable and efficient pathway for developing generalist LLM agents capable of continuous, real-time learning without gradient updates, advancing machine learning towards open-ended skill acquisition and deep research scenarios. The code is available at https://github.com/Agent-on-the-Fly/Memento.


Memento: Note-Taking for Your Future Self

arXiv.org Artificial Intelligence

Large language models (LLMs) excel at reasoning-only tasks, but struggle when reasoning must be tightly coupled with retrieval, as in multi-hop question answering. To overcome these limitations, we introduce a prompting strategy that first decomposes a complex question into smaller steps, then dynamically constructs a database of facts using LLMs, and finally pieces these facts together to solve the question. We show how this three-stage strategy, which we call Memento, can boost the performance of existing prompting strategies across diverse settings. On the 9-step PhantomWiki benchmark, Memento doubles the performance of chain-of-thought (CoT) when all information is provided in context. On the open-domain version of 2WikiMultiHopQA, CoT-RAG with Memento improves over vanilla CoT-RAG by more than 20 F1 percentage points and over the multi-hop RAG baseline, IRCoT, by more than 13 F1 percentage points. On the challenging MuSiQue dataset, Memento improves ReAct by more than 3 F1 percentage points, demonstrating its utility in agentic settings.


Memory-Enhanced Neural Solvers for Efficient Adaptation in Combinatorial Optimization

arXiv.org Artificial Intelligence

Combinatorial Optimization is crucial to numerous real-world applications, yet still presents challenges due to its (NP-)hard nature. Amongst existing approaches, heuristics often offer the best trade-off between quality and scalability, making them suitable for industrial use. While Reinforcement Learning (RL) offers a flexible framework for designing heuristics, its adoption over handcrafted heuristics remains incomplete within industrial solvers. Existing learned methods still lack the ability to adapt to specific instances and fully leverage the available computational budget. The current best methods either rely on a collection of pre-trained policies, or on data-inefficient fine-tuning; hence failing to fully utilize newly available information within the constraints of the budget. In response, we present MEMENTO, an RL approach that leverages memory to improve the adaptation of neural solvers at inference time. MEMENTO enables updating the action distribution dynamically based on the outcome of previous decisions. We validate its effectiveness on benchmark problems, in particular Traveling Salesman and Capacitated Vehicle Routing, demonstrating it can successfully be combined with standard methods to boost their performance under a given budget, both in and out-of-distribution, improving their performance on all 12 evaluated tasks.


Mementos: A Comprehensive Benchmark for Multimodal Large Language Model Reasoning over Image Sequences

arXiv.org Artificial Intelligence

Multimodal Large Language Models (MLLMs) have demonstrated proficiency in handling a variety of visual-language tasks. However, current MLLM benchmarks are predominantly designed to evaluate reasoning based on static information about a single image, and the ability of modern MLLMs to extrapolate from image sequences, which is essential for understanding our ever-changing world, has been less investigated. To address this challenge, this paper introduces Mementos, a new benchmark designed to assess MLLMs' sequential image reasoning abilities. Mementos features 4,761 diverse image sequences with varying lengths. We also employ a GPT-4 assisted method to evaluate MLLM reasoning performance. Through a careful evaluation of nine recent MLLMs on Mementos, including GPT-4V and Gemini, we find that they struggle to accurately describe dynamic information about given image sequences, often leading to hallucinations/misrepresentations of objects and their corresponding behaviors. Our quantitative analysis and case studies identify three key factors impacting MLLMs' sequential image reasoning: the correlation between object and behavioral hallucinations, the influence of cooccurring behaviors, and the compounding impact of behavioral hallucinations. Our dataset is available at https://github.com/umd-huang-lab/Mementos.


Memento: Facilitating Effortless, Efficient, and Reliable ML Experiments

arXiv.org Artificial Intelligence

Running complex sets of machine learning experiments is challenging and time-consuming due to the lack of a unified framework. This leaves researchers forced to spend time implementing necessary features such as parallelization, caching, and checkpointing themselves instead of focussing on their project. To simplify the process, in this paper, we introduce Memento, a Python package that is designed to aid researchers and data scientists in the efficient management and execution of computationally intensive experiments. Memento has the capacity to streamline any experimental pipeline by providing a straightforward configuration matrix and the ability to concurrently run experiments across multiple threads.


Artificial intelligence makes some progress, but robots still can't match humans

AITopics Original Links

When you call your bank, the robot on the other end doesn't want you to communicate using your touch-tone keypad anymore. No, it insists that you just speak to it, sometimes even adding, "You can use a wide variety of words." Your car is trying to emasculate you by taking over the parallel parking duties. And computers have long since drained all the fun out of chess. Fortunately, most robots aren't the complicated emotional beings that star in movies, and we're still pretty good at identifying android impostors.


On the Ranch with the Creators of "Westworld"

The New Yorker

My day job, in lieu of teaching creative writing like a normal person, is writing scripts for blockbuster video games. Last summer, while I watched a play-through of the then-unreleased Gears of War 4, for which I was the lead writer, something odd happened. The game's story called for a massive plane crash, out of which a single robot, operatically aflame, was intended to stride toward the player. Within the game's fiction, robots have hitherto opposed the player, but we wanted this particular burning robot to pose no immediate threat. The game programmers had thus switched off the hostility driven by the robot's artificial intelligence, allowing the player to walk past the hapless robot or shoot it. Most of us on the development team, I think, hoped our game's future players wouldn't shoot. Just ahead of the encounter we placed what is referred to, in game design, as a frontgate--a kind of contrived environmental blockage intended to prevent players from rushing too far ahead, which can mess up loading times.


The Latest "Westworld" Reveal Shows It's No "Game of Thrones"

The New Yorker

As the deviously puzzling first half of HBO's "Westworld" has unfolded, sleuths on fan sites and reddit threads have spun elaborate theories about what is really going on in the futuristic, Wild West-themed amusement park of the title. We know that the park is an adult playground where human "guests" can carry out their most sadistic fantasies on the bodies of the grounds' life-like robot "hosts." We know that each day, after being raped, murdered, and otherwise violated for the pleasures of their guests, the robots are refurbished by Westworld staff, their memories wiped clean--but that, by a glitch in the system (or by some secret design), hosts like the obedient and good-hearted Dolores (Evan Rachel Wood) and the sharp-tongued bordello owner Maeve (Thandie Newton) are beginning to piece together their traumatic pasts. But there are so many essential things that we don't yet understand. Who was Arnold, the park's mysterious co-creator, who died somewhere within Westworld's borders and whose ghost seems to be haunting his android creations?