Training Reinforcement Learning Agents and Humans With Difficulty-Conditioned Generators

Tio, Sidney, Ho, Jimmy, Varakantham, Pradeep

Dec-4-2023–arXiv.org Artificial Intelligence

We introduce Parameterized Environment Response Model (PERM), a method for training both Reinforcement Learning (RL) Agents and human learners in parameterized environments by directly modeling difficulty and ability. Inspired by Item Response Theory (IRT), PERM aligns environment difficulty with individual ability, creating a Zone of Proximal Development-based curriculum. Remarkably, PERM operates without real-time RL updates and allows for offline training, ensuring its adaptability across diverse students. We present a two-stage training process that capitalizes on PERM's adaptability, and demonstrate its effectiveness in training RL agents and humans in an empirical study. Figure 1: Overview of the proposed 2-stage process. In Stage 1, the IRT-based Parameterized Environment Response Model (PERM) observes a Reinforcement Learning (RL) Agent as it trains in a given environment with randomized levels. During this stage, PERM learns to accurately infer both student ability and level difficulty. In Stage 2, once trained, PERM is deployed to train both artificial and human students. It achieves this by inferring their current ability and providing suitable training levels within the same domain.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

Dec-4-2023

arXiv.org PDF

Add feedback

Country:
- Europe > Netherlands (0.14)

Genre:
- Research Report
  - Experimental Study (1.00)
  - New Finding (0.69)

Industry:
- Education (1.00)
- Leisure & Entertainment > Games
  - Computer Games (0.68)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)