AITopics | step forward

Collaborating Authors

step forward

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

EvaLearn Quantifying the Learning Capability and Efficiency of LLMs via Sequential Problem Solving

Neural Information Processing SystemsJun-22-2026, 05:43:11 GMT

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

Asia > Middle East (0.45)
North America > United States > Minnesota (0.27)

Genre:

Research Report > Experimental Study (1.00)
Overview (1.00)

Industry:

Education (1.00)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Lookahead Optimizer: k steps forward, 1 step back

Neural Information Processing SystemsDec-25-2025, 17:28:20 GMT

The vast majority of successful deep neural networks are trained using variants of stochastic gradient descent (SGD) algorithms. Recent attempts to improve SGD can be broadly categorized into two approaches: (1) adaptive learning rate schemes, such as AdaGrad and Adam and (2) accelerated schemes, such as heavy-ball and Nesterov momentum. In this paper, we propose a new optimization algorithm, Lookahead, that is orthogonal to these previous approaches and iteratively updates two sets of weights. Intuitively, the algorithm chooses a search direction by looking ahead at the sequence of ``fast weights generated by another optimizer. We show that Lookahead improves the learning stability and lowers the variance of its inner optimizer with negligible computation and memory cost. We empirically demonstrate Lookahead can significantly improve the performance of SGD and Adam, even with their default hyperparameter settings on ImageNet, CIFAR-10/100, neural machine translation, and Penn Treebank.

lookahead optimizer, name change, step forward, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.61)

Add feedback

ELPO: Ensemble Learning Based Prompt Optimization for Large Language Models

Zhang, Qing, Xu, Bing, Zhang, Xudong, Shi, Yifan, Li, Yang, Zhang, Chen, Wu, Yik Chung, Wong, Ngai, Chen, Yijie, Dai, Hong, Chen, Xiansen, Zhang, Mian

arXiv.org Artificial IntelligenceNov-21-2025

The remarkable performance of Large Language Models (LLMs) highly relies on crafted prompts. However, manual prompt engineering is a laborious process, creating a core bottleneck for practical application of LLMs. This phenomenon has led to the emergence of a new research area known as Automatic Prompt Optimization (APO), which develops rapidly in recent years. Existing APO methods such as those based on evolutionary algorithms or trial-and-error approaches realize an efficient and accurate prompt optimization to some extent. However, those researches focus on a single model or algorithm for the generation strategy and optimization process, which limits their performance when handling complex tasks. To address this, we propose a novel framework called Ensemble Learning based Prompt Optimization (ELPO) to achieve more accurate and robust results. Motivated by the idea of ensemble learning, ELPO conducts voting mechanism and introduces shared generation strategies along with different search methods for searching superior prompts. Moreover, ELPO creatively presents more efficient algorithms for the prompt generation and search process. Experimental results demonstrate that ELPO outperforms state-of-the-art prompt optimization methods across different tasks, e.g., improving F1 score by 7.6 on ArSarcasm dataset.

artificial intelligence, large language model, natural language, (19 more...)

arXiv.org Artificial Intelligence

2511.16122

Country: Asia (0.46)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine > Consumer Health (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

CoFineLLM: Conformal Finetuning of LLMs for Language-Instructed Robot Planning

Wang, Jun, Vorobeychik, Yevgeniy, Kantaros, Yiannis

arXiv.org Artificial IntelligenceNov-11-2025

Large Language Models (LLMs) have recently emerged as planners for language-instructed agents, generating sequences of actions to accomplish natural language tasks. However, their reliability remains a challenge, especially in long-horizon tasks, since they often produce overconfident yet wrong outputs. Conformal Prediction (CP) has been leveraged to address this issue by wrapping LLM outputs into prediction sets that contain the correct action with a user-defined confidence. When the prediction set is a singleton, the planner executes that action; otherwise, it requests help from a user. This has led to LLM-based planners that can ensure plan correctness with a user-defined probability. However, as LLMs are trained in an uncertainty-agnostic manner, without awareness of prediction sets, they tend to produce unnecessarily large sets, particularly at higher confidence levels, resulting in frequent human interventions limiting autonomous deployment. To address this, we introduce CoFineLLM (Conformal Finetuning for LLMs), the first CP-aware fine-tuning framework for LLM-based planners that explicitly reduces prediction-set size and, in turn, the need for user interventions. We evaluate our approach on multiple language-instructed robot planning problems and show consistent improvements over uncertainty-aware and uncertainty-agnostic finetuning baselines in terms of prediction-set size, and help rates. Finally, we demonstrate robustness of our method to out-of-distribution scenarios in hardware experiments.

large language model, natural language, prediction, (16 more...)

arXiv.org Artificial Intelligence

2511.06575

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

EvaLearn: Quantifying the Learning Capability and Efficiency of LLMs via Sequential Problem Solving

Dou, Shihan, Zhang, Ming, Huang, Chenhao, Chen, Jiayi, Chen, Feng, Liu, Shichun, Liu, Yan, Liu, Chenxiao, Zhong, Cheng, Zhang, Zongzhang, Gui, Tao, Xin, Chao, Wei, Chengzhi, Yan, Lin, Wu, Yonghui, Zhang, Qi, Huang, Xuanjing

arXiv.org Artificial IntelligenceOct-22-2025

We introduce EvaLearn, a pioneering benchmark designed to evaluate large language models (LLMs) on their learning capability and efficiency in challenging tasks, a critical, yet underexplored aspect of model potential. EvaLearn contains 648 challenging problems across six task types, grouped into 182 sequences, each sequence dedicated to one task type. Diverging from most existing benchmarks that evaluate models in parallel, EvaLearn requires models to solve problems sequentially, allowing them to leverage the experience gained from previous solutions. EvaLearn provides five comprehensive automated metrics to evaluate models and quantify their learning capability and efficiency. We extensively benchmark nine frontier models and observe varied performance profiles: some models, such as Claude-3.7-sonnet, start with moderate initial performance but exhibit strong learning ability, while some models struggle to benefit from experience and may even show negative transfer. Moreover, we investigate model performance under two learning settings and find that instance-level rubrics and teacher-model feedback further facilitate model learning. Importantly, we observe that current LLMs with stronger static abilities do not show a clear advantage in learning capability across all tasks, highlighting that EvaLearn evaluates a new dimension of model performance. We hope EvaLearn provides a novel evaluation perspective for assessing LLM potential and understanding the gap between models and human capabilities, promoting the development of deeper and more dynamic evaluation approaches. All datasets, the automatic evaluation framework, and the results studied in this paper are available at the GitHub repository.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2506.02672

Country:

Asia > China (0.46)
Asia > Middle East (0.45)
North America > United States > Minnesota (0.27)

Genre:

Research Report > Experimental Study (1.00)
Overview (1.00)
Research Report > New Finding (0.92)

Industry:

Education (1.00)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

In the time of tariffs, Nvidia and AMD cut unusual deals with Trump

The GuardianAug-12-2025, 13:57:22 GMT

My Spotify playlists are undergoing a British invasion this week. Donald Trump announced this week that two US chipmakers would tithe 15% of their revenue from sales in China to the US government. Paying for the license to sell to Chinese customers represents an unprecedented deal. The chipmakers Nvidia and AMD have agreed to give the US government 15% of their revenue from advanced chips sold to China in return for export licences to the key market. The arrangement will lead to Nvidia giving 15% of its revenue from Chinese sales of its H20 chips, and AMD giving 15% of revenue from Chinese sales of its MI308 chips, according to reports citing US officials.

nvidia and amd, openai, trump, (14 more...)

The Guardian

Country:

Asia > China (0.47)
Oceania > Australia > Northern Territory (0.05)
North America > United States > Michigan (0.05)
(4 more...)

Industry:

Information Technology (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.47)

Add feedback

OpenAI says latest ChatGPT upgrade is big step forward but still can't do humans' jobs

The GuardianAug-7-2025, 17:00:45 GMT

OpenAI has claimed to have taken a "significant step" towards artificial general intelligence (AGI) with the launch of its latest upgrade to ChatGPT, but has admitted there are still "many things" missing in its quest to create a system able to do humans' jobs. The startup said its GPT-5 model, the underlying technology that will power its breakthrough AI chatbot, represents a big upgrade on its predecessors in areas such as coding and creative writing – and is also a lot less sycophantic. It said the upgrade was being made available to all of ChatGPT's 700 million weekly users immediately. Sam Altman, OpenAI's chief executive, called the model a "significant step forward" to achieving the theoretical state of AGI, which the startup defines as a highly autonomous system that outperforms humans at most economically valuable work – or, in other words, can do their jobs. However, Altman admitted GPT-5 had not reached that goal yet.

chatbot, chatgpt, openai, (13 more...)

The Guardian

Country: North America > United States > California > San Francisco County > San Francisco (0.05)

Industry:

Health & Medicine > Therapeutic Area > Oncology (0.31)
Education > Educational Setting > K-12 Education (0.31)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.88)

Add feedback

Enhancing Decision-Making of Large Language Models via Actor-Critic

Dong, Heng, Duan, Kefei, Zhang, Chongjie

arXiv.org Artificial IntelligenceJun-10-2025

Large Language Models (LLMs) have achieved remarkable advancements in natural language processing tasks, yet they encounter challenges in complex decision-making scenarios that require long-term reasoning and alignment with high-level objectives. Existing methods either rely on short-term auto-regressive action generation or face limitations in accurately simulating rollouts and assessing outcomes, leading to sub-optimal decisions. This paper introduces a novel LLM-based Actor-Critic framework, termed LAC, that effectively improves LLM policies with long-term action evaluations in a principled and scalable way. Our approach addresses two key challenges: (1) extracting robust action evaluations by computing Q-values via token logits associated with positive/negative outcomes, enhanced by future trajectory rollouts and reasoning; and (2) enabling efficient policy improvement through a gradient-free mechanism. Experiments across diverse environments -- including high-level decision-making (ALFWorld), low-level action spaces (BabyAI-Text), and large action spaces (WebShop) -- demonstrate the framework's generality and superiority over state-of-the-art methods. Notably, our approach achieves competitive performance using 7B/8B parameter LLMs, even outperforming baseline methods employing GPT-4 in complex tasks. These results underscore the potential of integrating structured policy optimization with LLMs' intrinsic knowledge to advance decision-making capabilities in multi-step environments.

large language model, machine learning, step forward, (20 more...)

arXiv.org Artificial Intelligence

2506.06376

Country:

Asia (0.28)
North America > United States > Massachusetts (0.27)

Genre:

Research Report > New Finding (0.87)
Research Report > Promising Solution (0.65)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Reviews: Lookahead Optimizer: k steps forward, 1 step back

Neural Information Processing SystemsJan-25-2025, 16:59:59 GMT

Update: I have read the author's response and have kept my score. Please note that in DeVries and Taylor'17, 'ResNet-18' is not truly the ResNet-18 model (it consists of 4 stages and has more than an order of magnitude more parameters than the original ResNet-18 due to wider channels). This should be made clear in the paper in order not to cause more confusion in the community. Originality: Medium/High The proposed algorithm is considerably different than recently proposed methods for deep learning, which gravitate towards adaptive gradient methods. It has some similarities to variance reduction algorithms with inner and outer loops, however Lookahead has a very simple outer loop structure and and is easy to implement.

lookahead optimizer, step back, step forward, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.41)

Add feedback

Reinforcement Learning for Aligning Large Language Models Agents with Interactive Environments: Quantifying and Mitigating Prompt Overfitting

Aissi, Mohamed Salim, Romac, Clement, Carta, Thomas, Lamprier, Sylvain, Oudeyer, Pierre-Yves, Sigaud, Olivier, Soulier, Laure, Thome, Nicolas

arXiv.org Artificial IntelligenceOct-29-2024

Reinforcement learning (RL) is a promising approach for aligning large language models (LLMs) knowledge with sequential decision-making tasks. However, few studies have thoroughly investigated the impact on LLM agents capabilities of fine-tuning them with RL in a specific environment. In this paper, we propose a novel framework to analyze the sensitivity of LLMs to prompt formulations following RL training in a textual environment. Our findings reveal that the performance of LLMs degrades when faced with prompt formulations different from those used during the RL training phase. Besides, we analyze the source of this sensitivity by examining the model's internal representations and salient tokens. Finally, we propose to use a contrastive loss to mitigate this sensitivity and improve the robustness and generalization capabilities of LLMs.

formulation, llm, prompt formulation, (15 more...)

arXiv.org Artificial Intelligence

2410.1992

Country:

North America > Canada > Ontario > Toronto (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback