CoDAE: Adapting Large Language Models for Education via Chain-of-Thought Data Augmentation

Yuan, Shuzhou, LaCroix, William, Ghoshal, Hardik, Nie, Ercong, Färber, Michael

Aug-13-2025–arXiv.org Artificial Intelligence

Large Language Models (LLMs) are increasingly employed as AI tutors due to their scalability and potential for personalized instruction. However, off-the-shelf LLMs often underperform in educational settings: they frequently reveal answers too readily, fail to adapt their responses to student uncertainty, and remain vulnerable to emotionally manipulative prompts. To address these challenges, we introduce CoDAE, a framework that adapts LLMs for educational use through Chain-of-Thought (CoT) data augmentation. We collect real-world dialogues between students and a ChatGPT-based tutor and enrich them using CoT prompting to promote step-by-step reasoning and pedagogically aligned guidance. Furthermore, we design targeted dialogue cases to explicitly mitigate three key limitations: over-compliance, low response adaptivity, and threat vulnerability. We fine-tune four open-source LLMs on different variants of the augmented datasets and evaluate them in simulated educational scenarios using both automatic metrics and LLM-as-a-judge assessments. Our results show that models fine-tuned with CoDAE deliver more pedagogically appropriate guidance, better support reasoning processes, and effectively resist premature answer disclosure.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Aug-13-2025

arXiv.org PDF

Add feedback

Country:
- Asia > Thailand
  - Bangkok > Bangkok (0.04)
- North America
  - Canada > Ontario
    - Toronto (0.04)
  - United States > Florida
    - Miami-Dade County > Miami (0.04)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Education
  - Curriculum (0.46)
  - Educational Setting (0.48)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found