Goto

Collaborating Authors

 step back


Lookahead Optimizer: k steps forward, 1 step back

Neural Information Processing Systems

The vast majority of successful deep neural networks are trained using variants of stochastic gradient descent (SGD) algorithms. Recent attempts to improve SGD can be broadly categorized into two approaches: (1) adaptive learning rate schemes, such as AdaGrad and Adam and (2) accelerated schemes, such as heavy-ball and Nesterov momentum. In this paper, we propose a new optimization algorithm, Lookahead, that is orthogonal to these previous approaches and iteratively updates two sets of weights. Intuitively, the algorithm chooses a search direction by looking ahead at the sequence of ``fast weights generated by another optimizer. We show that Lookahead improves the learning stability and lowers the variance of its inner optimizer with negligible computation and memory cost. We empirically demonstrate Lookahead can significantly improve the performance of SGD and Adam, even with their default hyperparameter settings on ImageNet, CIFAR-10/100, neural machine translation, and Penn Treebank.


CoFineLLM: Conformal Finetuning of LLMs for Language-Instructed Robot Planning

Wang, Jun, Vorobeychik, Yevgeniy, Kantaros, Yiannis

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have recently emerged as planners for language-instructed agents, generating sequences of actions to accomplish natural language tasks. However, their reliability remains a challenge, especially in long-horizon tasks, since they often produce overconfident yet wrong outputs. Conformal Prediction (CP) has been leveraged to address this issue by wrapping LLM outputs into prediction sets that contain the correct action with a user-defined confidence. When the prediction set is a singleton, the planner executes that action; otherwise, it requests help from a user. This has led to LLM-based planners that can ensure plan correctness with a user-defined probability. However, as LLMs are trained in an uncertainty-agnostic manner, without awareness of prediction sets, they tend to produce unnecessarily large sets, particularly at higher confidence levels, resulting in frequent human interventions limiting autonomous deployment. To address this, we introduce CoFineLLM (Conformal Finetuning for LLMs), the first CP-aware fine-tuning framework for LLM-based planners that explicitly reduces prediction-set size and, in turn, the need for user interventions. We evaluate our approach on multiple language-instructed robot planning problems and show consistent improvements over uncertainty-aware and uncertainty-agnostic finetuning baselines in terms of prediction-set size, and help rates. Finally, we demonstrate robustness of our method to out-of-distribution scenarios in hardware experiments.


Step Back to Leap Forward: Self-Backtracking for Boosting Reasoning of Language Models

Yang, Xiao-Wen, Zhu, Xuan-Yi, Wei, Wen-Da, Zhang, Ding-Chu, Shao, Jie-Jing, Zhou, Zhi, Guo, Lan-Zhe, Li, Yu-Feng

arXiv.org Artificial Intelligence

The integration of slow-thinking mechanisms into large language models (LLMs) offers a promising way toward achieving Level 2 AGI Reasoners, as exemplified by systems like OpenAI's o1. However, several significant challenges remain, including inefficient overthinking and an overreliance on auxiliary reward models. We point out that these limitations stem from LLMs' inability to internalize the search process, a key component of effective reasoning. A critical step toward addressing this issue is enabling LLMs to autonomously determine when and where to backtrack, a fundamental operation in traditional search algorithms. To this end, we propose a self-backtracking mechanism that equips LLMs with the ability to backtrack during both training and inference. This mechanism not only enhances reasoning ability but also efficiency by transforming slow-thinking processes into fast-thinking through self-improvement. Empirical evaluations demonstrate that our proposal significantly enhances the reasoning capabilities of LLMs, achieving a performance gain of over 40 percent compared to the optimal-path supervised fine-tuning method. We believe this study introduces a novel and promising pathway for developing more advanced and robust Reasoners.


Reviews: Lookahead Optimizer: k steps forward, 1 step back

Neural Information Processing Systems

Update: I have read the author's response and have kept my score. Please note that in DeVries and Taylor'17, 'ResNet-18' is not truly the ResNet-18 model (it consists of 4 stages and has more than an order of magnitude more parameters than the original ResNet-18 due to wider channels). This should be made clear in the paper in order not to cause more confusion in the community. Originality: Medium/High The proposed algorithm is considerably different than recently proposed methods for deep learning, which gravitate towards adaptive gradient methods. It has some similarities to variance reduction algorithms with inner and outer loops, however Lookahead has a very simple outer loop structure and and is easy to implement.


Lookahead Optimizer: k steps forward, 1 step back

Neural Information Processing Systems

The vast majority of successful deep neural networks are trained using variants of stochastic gradient descent (SGD) algorithms. Recent attempts to improve SGD can be broadly categorized into two approaches: (1) adaptive learning rate schemes, such as AdaGrad and Adam and (2) accelerated schemes, such as heavy-ball and Nesterov momentum. In this paper, we propose a new optimization algorithm, Lookahead, that is orthogonal to these previous approaches and iteratively updates two sets of weights. Intuitively, the algorithm chooses a search direction by looking ahead at the sequence of fast weights" generated by another optimizer. We show that Lookahead improves the learning stability and lowers the variance of its inner optimizer with negligible computation and memory cost.


The future of travel? For hyperloop, it's one step forward, two steps back

Al Jazeera

Taipei, Taiwan – Imagine boarding a train that glides above the ground at supersonic speeds. Speeding through an airless tube using powerful electro-magnets, passengers could travel from San Francisco to Los Angeles, London to Paris, or Basra to Baghdad in less than an hour. The train would be potentially greener than existing modes of transportation, too, using electricity that could be drawn from renewable energy sources. While it may sound like the stuff of science fiction, scientists and engineers in multiple countries are working on making the concept of the so-called hyperloop a reality. Hyperloop proponents, who include tech billionaire Elon Musk, have announced a series of recent breakthroughs in progressing the technology, whose development has been plagued by commercial setbacks and doubts about its feasibility.


Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models

Zheng, Huaixiu Steven, Mishra, Swaroop, Chen, Xinyun, Cheng, Heng-Tze, Chi, Ed H., Le, Quoc V, Zhou, Denny

arXiv.org Artificial Intelligence

We present Step-Back Prompting, a simple prompting technique that enables LLMs to do abstractions to derive high-level concepts and first principles from instances containing specific details. Using the concepts and principles to guide the reasoning steps, LLMs significantly improve their abilities in following a correct reasoning path towards the solution. We conduct experiments of Step-Back Prompting with PaLM-2L models and observe substantial performance gains on a wide range of challenging reasoning-intensive tasks including STEM, Knowledge QA, and Multi-Hop Reasoning. For instance, Step-Back Prompting improves PaLM-2L performance on MMLU Physics and Chemistry by 7% and 11%, TimeQA by 27%, and MuSiQue by 7%.


Taking a Step Back with KCal: Multi-Class Kernel-Based Calibration for Deep Neural Networks

Lin, Zhen, Trivedi, Shubhendu, Sun, Jimeng

arXiv.org Machine Learning

Deep neural network (DNN) classifiers are often overconfident, producing miscalibrated class probabilities. Most existing calibration methods either lack theoretical guarantees for producing calibrated outputs or reduce the classification accuracy in the process. This paper proposes a new Kernel-based calibration method called KCal. Unlike other calibration procedures, KCal does not operate directly on the logits or softmax outputs of the DNN. Instead, it uses the penultimate-layer latent embedding to train a metric space in a supervised manner. In effect, KCal amounts to a supervised dimensionality reduction of the neural network embedding, and generates a prediction using kernel density estimation on a holdout calibration set. We first analyze KCal theoretically, showing that it enjoys a provable asymptotic calibration guarantee. Then, through extensive experiments, we confirm that KCal consistently outperforms existing calibration methods in terms of both the classification accuracy and the (confidence and class-wise) calibration error.


Lack of diversity in AI development causes serious real-life harm for people of color

#artificialintelligence

Every time you ask Alexa to turn on your lights or play a song, you're using AI. But AI is also put to work in more serious ways, like facial recognition software by law enforcement. Some critics say there's a troubling lack of diversity among those who create the programs, and that is causing serious harm for people of color. We're joined now by Angle Bush. ANGLE BUSH: Thank you for having me.


The secret to AI success? Focusing on data preparation

#artificialintelligence

Datasets are essential to AI models. They provide the truth by which we train AI models and measure a model's success. Engineers often look to the AI model as the key to delivering highly accurate results, but in reality it is often the data that determines an AI model success. Data flows through every step of the AI workflow, from model training to deployment, and the way it is prepared can be the main driver of accuracy when designing robust AI models. Engineers can use these five tips to improve their data preparation process and drive success when developing a complete AI system.