Efficient RL for optimizing conversation level outcomes with an LLM-based tutor

Nam, Hyunji, Gottesman, Omer, Zhang, Amy, Foster, Dean, Brunskill, Emma, Ungar, Lyle

Jul-23-2025–arXiv.org Artificial Intelligence

Large language models (LLMs) built on existing reinforcement learning with human feedback (RLHF) frameworks typically optimize responses based on immediate turn-level human preferences. However, this approach falls short in multi-turn dialogue settings, such as online math tutoring. We propose a method to enhance LLM-based tutors by representing the dialogue history with a lower-dimensional latent state representation of a student and optimizing a long-term policy to determine high-level actions based on the latent state. The goal is to better align the tutor's behavior with the long-term objective of guiding the student towards solving a target math problem on their own. Our model is lightweight, requiring less computational resources than prior work of training the tutor policy end-to-end to directly output the tutor's next utterance. Our experiment results demonstrate that these modifications lead to improved long-term outcomes compared to prompting in LLM-simulated tutoring tasks.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Jul-23-2025

arXiv.org PDF

Add feedback

Country:
- Africa > Mali (0.04)
- Asia > Middle East
  - Jordan (0.04)
- Europe > Monaco (0.04)
- North America
  - Mexico > Mexico City
    - Mexico City (0.04)
  - United States
    - California > Santa Clara County
      - Palo Alto (0.04)
    - Massachusetts > Middlesex County
      - Cambridge (0.04)
    - Pennsylvania (0.04)
    - Texas > Travis County
      - Austin (0.04)
    - Washington > King County
      - Seattle (0.04)

Genre:
- Research Report > New Finding (0.87)

Industry:
- Education
  - Educational Setting > K-12 Education (0.49)
  - Educational Technology (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found