Reward Engineering for Generating Semi-structured Explanation
Han, Jiuzhou, Buntine, Wray, Shareghi, Ehsan
–arXiv.org Artificial Intelligence
Semi-structured explanation depicts the implicit process of a reasoner with an explicit representation. This explanation highlights how available information in a specific query is utilised and supplemented with information a reasoner produces from its internal weights towards generating an answer. Despite the recent improvements in generative capabilities of language models, producing structured explanations to verify a model's true reasoning capabilities remains a challenge. This issue is particularly pronounced for not-so-large LMs (e.g., FLAN-T5-XXL). In this work, we first underscore the limitations of supervised fine-tuning (SFT) in tackling this challenge, and then introduce a carefully crafted reward engineering method in reinforcement learning (RL) to better address this problem. We investigate multiple reward aggregation methods and provide a detailed discussion which sheds light on the promising potential of RL for future research. Our proposed method on two semi-structured explanation generation benchmarks (ExplaGraph and COPA-SSE) achieves new state-of-the-art results.
arXiv.org Artificial Intelligence
Jan-23-2024
- Country:
- North America
- Dominican Republic (0.04)
- United States > California
- San Diego County > San Diego (0.04)
- Canada
- Quebec > Montreal (0.04)
- Ontario > Toronto (0.04)
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.04)
- Europe
- Ireland (0.04)
- Sweden > Stockholm
- Stockholm (0.04)
- Portugal > Lisbon
- Lisbon (0.04)
- Germany > Bavaria
- Upper Bavaria > Munich (0.04)
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Asia
- China > Hong Kong (0.04)
- Japan > Kyūshū & Okinawa
- Kyūshū > Miyazaki Prefecture > Miyazaki (0.04)
- North America
- Genre:
- Research Report (0.82)
- Industry:
- Health & Medicine (0.96)
- Technology: