Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models
Song, Yuda, Zhang, Hanlin, Eisenach, Carson, Kakade, Sham, Foster, Dean, Ghai, Udaya
–arXiv.org Artificial Intelligence
While synthetic data, often generated by LLMs, offers a valuable complement to human-generated data, its misuse can harm performance. Bertrand et al. (2023) and Gerstgrasser et al. (2024) showed self-training on model-generated data leads to degradation. To mitigate this, incorporating a "reliable" verifier to label data has shown promise in preventing such performance collapse (Gillman et al., 2024). A straightforward verification mechanism is to train a reward model on human-annotated data to assess the quality of synthetic data (Lightman et al., 2023; Wang et al., 2024a). However, this approach can be prohibitively expensive and may offer few signals in domains where models exhibit super-human performance. An alternative is to use a stronger model (Chang et al., 2023; Havrilla et al., 2024) for annotation, but this becomes infeasible when the model is at the frontier of current capabilities. A promising solution is to use the model to label its own generations. Motivated by the intuition that "verification is easier than generation", one can hypothesize that the model may act as a better-than-random verifier of its own outputs, enabling self-improvement (Zelikman et al., 2022).
arXiv.org Artificial Intelligence
Dec-3-2024