Goto

Collaborating Authors

 step forward


Lookahead Optimizer: k steps forward, 1 step back

Neural Information Processing Systems

The vast majority of successful deep neural networks are trained using variants of stochastic gradient descent (SGD) algorithms. Recent attempts to improve SGD can be broadly categorized into two approaches: (1) adaptive learning rate schemes, such as AdaGrad and Adam and (2) accelerated schemes, such as heavy-ball and Nesterov momentum. In this paper, we propose a new optimization algorithm, Lookahead, that is orthogonal to these previous approaches and iteratively updates two sets of weights. Intuitively, the algorithm chooses a search direction by looking ahead at the sequence of ``fast weights generated by another optimizer. We show that Lookahead improves the learning stability and lowers the variance of its inner optimizer with negligible computation and memory cost. We empirically demonstrate Lookahead can significantly improve the performance of SGD and Adam, even with their default hyperparameter settings on ImageNet, CIFAR-10/100, neural machine translation, and Penn Treebank.


CoFineLLM: Conformal Finetuning of LLMs for Language-Instructed Robot Planning

Wang, Jun, Vorobeychik, Yevgeniy, Kantaros, Yiannis

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have recently emerged as planners for language-instructed agents, generating sequences of actions to accomplish natural language tasks. However, their reliability remains a challenge, especially in long-horizon tasks, since they often produce overconfident yet wrong outputs. Conformal Prediction (CP) has been leveraged to address this issue by wrapping LLM outputs into prediction sets that contain the correct action with a user-defined confidence. When the prediction set is a singleton, the planner executes that action; otherwise, it requests help from a user. This has led to LLM-based planners that can ensure plan correctness with a user-defined probability. However, as LLMs are trained in an uncertainty-agnostic manner, without awareness of prediction sets, they tend to produce unnecessarily large sets, particularly at higher confidence levels, resulting in frequent human interventions limiting autonomous deployment. To address this, we introduce CoFineLLM (Conformal Finetuning for LLMs), the first CP-aware fine-tuning framework for LLM-based planners that explicitly reduces prediction-set size and, in turn, the need for user interventions. We evaluate our approach on multiple language-instructed robot planning problems and show consistent improvements over uncertainty-aware and uncertainty-agnostic finetuning baselines in terms of prediction-set size, and help rates. Finally, we demonstrate robustness of our method to out-of-distribution scenarios in hardware experiments.


EvaLearn: Quantifying the Learning Capability and Efficiency of LLMs via Sequential Problem Solving

Dou, Shihan, Zhang, Ming, Huang, Chenhao, Chen, Jiayi, Chen, Feng, Liu, Shichun, Liu, Yan, Liu, Chenxiao, Zhong, Cheng, Zhang, Zongzhang, Gui, Tao, Xin, Chao, Wei, Chengzhi, Yan, Lin, Wu, Yonghui, Zhang, Qi, Huang, Xuanjing

arXiv.org Artificial Intelligence

We introduce EvaLearn, a pioneering benchmark designed to evaluate large language models (LLMs) on their learning capability and efficiency in challenging tasks, a critical, yet underexplored aspect of model potential. EvaLearn contains 648 challenging problems across six task types, grouped into 182 sequences, each sequence dedicated to one task type. Diverging from most existing benchmarks that evaluate models in parallel, EvaLearn requires models to solve problems sequentially, allowing them to leverage the experience gained from previous solutions. EvaLearn provides five comprehensive automated metrics to evaluate models and quantify their learning capability and efficiency. We extensively benchmark nine frontier models and observe varied performance profiles: some models, such as Claude-3.7-sonnet, start with moderate initial performance but exhibit strong learning ability, while some models struggle to benefit from experience and may even show negative transfer. Moreover, we investigate model performance under two learning settings and find that instance-level rubrics and teacher-model feedback further facilitate model learning. Importantly, we observe that current LLMs with stronger static abilities do not show a clear advantage in learning capability across all tasks, highlighting that EvaLearn evaluates a new dimension of model performance. We hope EvaLearn provides a novel evaluation perspective for assessing LLM potential and understanding the gap between models and human capabilities, promoting the development of deeper and more dynamic evaluation approaches. All datasets, the automatic evaluation framework, and the results studied in this paper are available at the GitHub repository.


In the time of tariffs, Nvidia and AMD cut unusual deals with Trump

The Guardian

My Spotify playlists are undergoing a British invasion this week. Donald Trump announced this week that two US chipmakers would tithe 15% of their revenue from sales in China to the US government. Paying for the license to sell to Chinese customers represents an unprecedented deal. The chipmakers Nvidia and AMD have agreed to give the US government 15% of their revenue from advanced chips sold to China in return for export licences to the key market. The arrangement will lead to Nvidia giving 15% of its revenue from Chinese sales of its H20 chips, and AMD giving 15% of revenue from Chinese sales of its MI308 chips, according to reports citing US officials.


OpenAI says latest ChatGPT upgrade is big step forward but still can't do humans' jobs

The Guardian

OpenAI has claimed to have taken a "significant step" towards artificial general intelligence (AGI) with the launch of its latest upgrade to ChatGPT, but has admitted there are still "many things" missing in its quest to create a system able to do humans' jobs. The startup said its GPT-5 model, the underlying technology that will power its breakthrough AI chatbot, represents a big upgrade on its predecessors in areas such as coding and creative writing – and is also a lot less sycophantic. It said the upgrade was being made available to all of ChatGPT's 700 million weekly users immediately. Sam Altman, OpenAI's chief executive, called the model a "significant step forward" to achieving the theoretical state of AGI, which the startup defines as a highly autonomous system that outperforms humans at most economically valuable work – or, in other words, can do their jobs. However, Altman admitted GPT-5 had not reached that goal yet.


Enhancing Decision-Making of Large Language Models via Actor-Critic

Dong, Heng, Duan, Kefei, Zhang, Chongjie

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have achieved remarkable advancements in natural language processing tasks, yet they encounter challenges in complex decision-making scenarios that require long-term reasoning and alignment with high-level objectives. Existing methods either rely on short-term auto-regressive action generation or face limitations in accurately simulating rollouts and assessing outcomes, leading to sub-optimal decisions. This paper introduces a novel LLM-based Actor-Critic framework, termed LAC, that effectively improves LLM policies with long-term action evaluations in a principled and scalable way. Our approach addresses two key challenges: (1) extracting robust action evaluations by computing Q-values via token logits associated with positive/negative outcomes, enhanced by future trajectory rollouts and reasoning; and (2) enabling efficient policy improvement through a gradient-free mechanism. Experiments across diverse environments -- including high-level decision-making (ALFWorld), low-level action spaces (BabyAI-Text), and large action spaces (WebShop) -- demonstrate the framework's generality and superiority over state-of-the-art methods. Notably, our approach achieves competitive performance using 7B/8B parameter LLMs, even outperforming baseline methods employing GPT-4 in complex tasks. These results underscore the potential of integrating structured policy optimization with LLMs' intrinsic knowledge to advance decision-making capabilities in multi-step environments.


Reviews: Lookahead Optimizer: k steps forward, 1 step back

Neural Information Processing Systems

Update: I have read the author's response and have kept my score. Please note that in DeVries and Taylor'17, 'ResNet-18' is not truly the ResNet-18 model (it consists of 4 stages and has more than an order of magnitude more parameters than the original ResNet-18 due to wider channels). This should be made clear in the paper in order not to cause more confusion in the community. Originality: Medium/High The proposed algorithm is considerably different than recently proposed methods for deep learning, which gravitate towards adaptive gradient methods. It has some similarities to variance reduction algorithms with inner and outer loops, however Lookahead has a very simple outer loop structure and and is easy to implement.


Reinforcement Learning for Aligning Large Language Models Agents with Interactive Environments: Quantifying and Mitigating Prompt Overfitting

Aissi, Mohamed Salim, Romac, Clement, Carta, Thomas, Lamprier, Sylvain, Oudeyer, Pierre-Yves, Sigaud, Olivier, Soulier, Laure, Thome, Nicolas

arXiv.org Artificial Intelligence

Reinforcement learning (RL) is a promising approach for aligning large language models (LLMs) knowledge with sequential decision-making tasks. However, few studies have thoroughly investigated the impact on LLM agents capabilities of fine-tuning them with RL in a specific environment. In this paper, we propose a novel framework to analyze the sensitivity of LLMs to prompt formulations following RL training in a textual environment. Our findings reveal that the performance of LLMs degrades when faced with prompt formulations different from those used during the RL training phase. Besides, we analyze the source of this sensitivity by examining the model's internal representations and salient tokens. Finally, we propose to use a contrastive loss to mitigate this sensitivity and improve the robustness and generalization capabilities of LLMs.


Lookahead Optimizer: k steps forward, 1 step back

Neural Information Processing Systems

The vast majority of successful deep neural networks are trained using variants of stochastic gradient descent (SGD) algorithms. Recent attempts to improve SGD can be broadly categorized into two approaches: (1) adaptive learning rate schemes, such as AdaGrad and Adam and (2) accelerated schemes, such as heavy-ball and Nesterov momentum. In this paper, we propose a new optimization algorithm, Lookahead, that is orthogonal to these previous approaches and iteratively updates two sets of weights. Intuitively, the algorithm chooses a search direction by looking ahead at the sequence of fast weights" generated by another optimizer. We show that Lookahead improves the learning stability and lowers the variance of its inner optimizer with negligible computation and memory cost.


The future of travel? For hyperloop, it's one step forward, two steps back

Al Jazeera

Taipei, Taiwan – Imagine boarding a train that glides above the ground at supersonic speeds. Speeding through an airless tube using powerful electro-magnets, passengers could travel from San Francisco to Los Angeles, London to Paris, or Basra to Baghdad in less than an hour. The train would be potentially greener than existing modes of transportation, too, using electricity that could be drawn from renewable energy sources. While it may sound like the stuff of science fiction, scientists and engineers in multiple countries are working on making the concept of the so-called hyperloop a reality. Hyperloop proponents, who include tech billionaire Elon Musk, have announced a series of recent breakthroughs in progressing the technology, whose development has been plagued by commercial setbacks and doubts about its feasibility.