Reward Engineering for Generating Semi-structured Explanation

Han, Jiuzhou, Buntine, Wray, Shareghi, Ehsan

Jan-23-2024–arXiv.org Artificial Intelligence

Semi-structured explanation depicts the implicit process of a reasoner with an explicit representation. This explanation highlights how available information in a specific query is utilised and supplemented with information a reasoner produces from its internal weights towards generating an answer. Despite the recent improvements in generative capabilities of language models, producing structured explanations to verify a model's true reasoning capabilities remains a challenge. This issue is particularly pronounced for not-so-large LMs (e.g., FLAN-T5-XXL). In this work, we first underscore the limitations of supervised fine-tuning (SFT) in tackling this challenge, and then introduce a carefully crafted reward engineering method in reinforcement learning (RL) to better address this problem. We investigate multiple reward aggregation methods and provide a detailed discussion which sheds light on the promising potential of RL for future research. Our proposed method on two semi-structured explanation generation benchmarks (ExplaGraph and COPA-SSE) achieves new state-of-the-art results.

explanation, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

Jan-23-2024

arXiv.org PDF

Add feedback

Country:
- North America
  - Dominican Republic (0.04)
  - United States > California
    - San Diego County > San Diego (0.04)
  - Canada
    - Quebec > Montreal (0.04)
    - Ontario > Toronto (0.04)
    - British Columbia > Metro Vancouver Regional District
      - Vancouver (0.04)
- Europe
  - Ireland (0.04)
  - Sweden > Stockholm
    - Stockholm (0.04)
  - Portugal > Lisbon
    - Lisbon (0.04)
  - Germany > Bavaria
    - Upper Bavaria > Munich (0.04)
  - France > Provence-Alpes-Côte d'Azur
    - Bouches-du-Rhône > Marseille (0.04)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)
- Asia
  - China > Hong Kong (0.04)
  - Japan > Kyūshū & Okinawa
    - Kyūshū > Miyazaki Prefecture > Miyazaki (0.04)

Genre:
- Research Report (0.82)

Industry:
- Health & Medicine (0.96)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Natural Language > Large Language Model (1.00)
  - Cognitive Science > Problem Solving (0.68)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)