cot
- North America > United States (0.14)
- Asia > Middle East > Jordan (0.04)
- Asia > China (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Research Report > New Finding (0.92)
- Overview (0.92)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- North America > United States > California > Orange County > Irvine (0.04)
- Workflow (1.00)
- Research Report > New Finding (1.00)
- Oceania > Australia > Victoria > Melbourne (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (6 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.93)
- Asia > Singapore (0.04)
- Asia > China > Hunan Province (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- (6 more...)
- Research Report > Experimental Study (0.93)
- Overview (0.67)
- Research Report > New Finding (0.67)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- (2 more...)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > Michigan (0.04)
- North America > United States > California > Riverside County > Riverside (0.04)
- Asia > Middle East > Jordan (0.04)
- Workflow (1.00)
- Research Report > New Finding (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
- Asia > China > Shaanxi Province > Xi'an (0.04)
- Asia > China > Guangxi Province > Nanning (0.04)
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > Washington > King County > Seattle (0.04)
- Africa (0.04)
Training Chain-of-Thought via Latent-Variable Inference
Large language models (LLMs) solve problems more accurately and interpretably when instructed to work out the answer step by step using a chain-of-thought (CoT) prompt. One can also improve LLMs' performance on a specific task by supervised fine-tuning, i.e., by using gradient ascent on some tunable parameters to maximize the average log-likelihood of correct answers from a labeled training set. Naively combining CoT with supervised tuning requires supervision not just of the correct answers, but also of detailed rationales that lead to those answers; these rationales are expensive to produce by hand. Instead, we propose a fine-tuning strategy that tries to maximize the \emph{marginal} log-likelihood of generating a correct answer using CoT prompting, approximately averaging over all possible rationales. The core challenge is sampling from the posterior over rationales conditioned on the correct answer; we address it using a simple Markov-chain Monte Carlo (MCMC) expectation-maximization (EM) algorithm inspired by the self-taught reasoner (STaR), memoized wake-sleep, Markovian score climbing, and persistent contrastive divergence. This algorithm also admits a novel control-variate technique that drives the variance of our gradient estimates to zero as the model improves. Applying our technique to GSM8K and the tasks in BIG-Bench Hard, we find that this MCMC-EM fine-tuning technique typically improves the model's accuracy on held-out examples more than STaR or prompt-tuning with or without CoT.