SEER: Facilitating Structured Reasoning and Explanation via Reinforcement Learning

Chen, Guoxin, Tang, Kexin, Yang, Chao, Ye, Fuying, Qiao, Yu, Qian, Yiming

Jan-24-2024–arXiv.org Artificial Intelligence

Elucidating the reasoning process with structured explanations from question to answer is fundamentally crucial, as it significantly enhances the interpretability and trustworthiness of question-answering (QA) systems. However, structured explanations demand models to perform intricate structured reasoning, which poses great challenges. Most existing methods focus on single-step reasoning through supervised learning, ignoring logical dependencies between steps. Meanwhile, existing reinforcement learning (RL)-based methods overlook the structured relationships, impeding RL's potential in structured reasoning. In this paper, we propose SEER, a novel method that maximizes a structure-based return to facilitate structured reasoning and explanation. Our proposed structure-based return precisely describes the hierarchical and branching structure inherent in structured reasoning, effectively capturing the intricate relationships between states. We also introduce a fine-grained reward function to meticulously delineate diverse reasoning steps. Extensive experiments show that SEER significantly outperforms state-of-the-art methods, achieving an absolute improvement of 6.9% over RL-based methods on EntailmentBank, a 4.4% average improvement on STREET benchmark, and exhibiting outstanding efficiency and cross-dataset generalization performance.

hemisphere, hypothesis, reasoning, (16 more...)

arXiv.org Artificial Intelligence

Jan-24-2024

arXiv.org PDF

Add feedback

Country:
- Europe > Netherlands (0.04)
- Asia
  - Middle East > Jordan (0.04)
  - China > Shanghai
    - Shanghai (0.04)

Genre:
- Research Report > Promising Solution (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Cognitive Science > Problem Solving (0.89)
  - Natural Language
    - Explanation & Argumentation (0.70)
    - Large Language Model (0.70)
  - Machine Learning
    - Reinforcement Learning (1.00)
    - Neural Networks > Deep Learning (0.47)