Goto

Collaborating Authors

 Large Language Model




Weak-to-StrongSearch: AlignLargeLanguageModelsvia SearchingoverSmallLanguageModels

Neural Information Processing Systems

Large language models are usually fine-tuned to align with human preferences. However, fine-tuning a large language model can be challenging. In this work, we introduceweak-to-strong search, framing the alignment of a large language model as a test-time greedy search to maximize the log-probability difference between small tuned and untuned models while sampling from the frozen large model. This method serves both as (1) a compute-efficient model up-scaling strategy that avoids directly tuning the large model and as (2) an instance of weak-to-strong generalization thatenhances astrong model with weak test-time guidance.



LanguageModelsareFew-ShotLearners

Neural Information Processing Systems

Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous nonsparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks andfew-shot demonstrations specified purelyviatextinteraction withthemodel.



TaskBench: BenchmarkingLargeLanguage ModelsforTaskAutomation

Neural Information Processing Systems

To address this, we introduceTASKBENCH, a comprehensive framework to evaluate the capability of LLMs in task automation. Specifically, task automation can be divided into three critical stages: task decomposition, tool selection, and parameter prediction. To tackle the complexities inherent in these stages, we introduce the concept of Tool Graph to represent decomposed tasksandadoptaback-instruct method togenerate high-quality userinstructions. We propose TASKEVAL, a multi-faceted evaluation methodology that assesses LLMperformance across thesethreestages.




ForecastPFN: Synthetically-Trained Zero-Shot Forecasting

Neural Information Processing Systems

The vast majority of time-series forecasting approaches require a substantial training dataset. However, many real-life forecasting applications have very little initial observations, sometimes just 40 or fewer. Thus, the applicability of most forecasting methods is restricted in data-sparse commercial applications.