Teacher Forcing Recovers Reward Functions for Text Generation

Jan-18-2023–arXiv.org Artificial Intelligence

Reinforcement learning (RL) has been widely used in text generation to alleviate the exposure bias issue or to utilize non-parallel datasets. The reward function plays an important role in making RL training successful. However, previous reward functions are typically task-specific and sparse, restricting the use of RL. In our work, we propose a task-agnostic approach that derives a step-wise reward function directly from a model trained with teacher forcing. We additionally propose a simple modification to stabilize the RL training on non-parallel datasets with our induced reward function. Empirical results show that our method outperforms self-training and reward regression methods on several text generation tasks, confirming the effectiveness of our reward function.

machine learning, natural language, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

Jan-18-2023

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia (0.04)
- North America > Canada
  - Alberta (0.14)
- Europe > Romania
  - Sud-Est Development Region > Tulcea County > Tulcea (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report > New Finding (0.66)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Machine Learning
    - Learning Graphical Models (0.93)
    - Reinforcement Learning (0.90)
    - Neural Networks > Deep Learning (0.46)
    - Statistical Learning > Regression (0.34)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found