Efficient Exploration for LLMs

Dwaracherla, Vikranth, Asghari, Seyed Mohammad, Hao, Botao, Van Roy, Benjamin

Feb-1-2024–arXiv.org Artificial Intelligence

Large language models demonstrate remarkable capabilities after learning from enormous volumes of text data (Anil et al., 2023; Hoffmann et al., 2022; OpenAI, 2023). Yet, reinforcement learning from human feedback (RLHF) greatly improves their behavior even after only tens of thousands of interactions (Bai et al., 2022; Glaese et al., 2022; Ouyang et al., 2022; Stiennon et al., 2020). The uptake of chatbots affords opportunities to gather increasing volumes of human feedback, with each engagement eliciting expressions of satisfaction or preference (OpenAI, 2022). It is natural to wonder what new capabilities may emerge with this growing source of data. Superhuman ingenuity remains an alluring possibility. With increasing volumes, more can be inferred from human feedback.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Feb-1-2024

arXiv.org PDF

Add feedback

Genre:
- Research Report > New Finding (0.94)

Industry:
- Education (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning > Generative AI (0.45)
  - Natural Language > Large Language Model (1.00)