Natural Response Generation for Chinese Reading Comprehension

Chen, Nuo, Li, Hongguang, Bao, Yinan, Wang, Baoyuan, Li, Jia

Oct-9-2023–arXiv.org Artificial Intelligence

Machine reading comprehension (MRC) is an important area of conversation agents and draws a lot of attention. However, there is a notable limitation to current MRC benchmarks: The labeled answers are mostly either spans extracted from the target corpus or the choices of the given candidates, ignoring the natural aspect of high-quality responses. As a result, MRC models trained on these datasets can not generate human-like responses in real QA scenarios. To this end, we construct a new dataset called Penguin to promote the research of MRC, providing a training and test bed for natural response generation to real scenarios. Concretely, Penguin consists of 200k training data with high-quality fluent, and well-informed responses. Penguin is the first benchmark towards natural response generation in Chinese MRC on a relatively large scale. To address the challenges in Penguin, we develop two strong baselines: end-to-end and two-stage frameworks. Following that, we further design Prompt-BART: fine-tuning the pre-trained generative language models with a mixture of prefix prompts in Penguin. Extensive experiments validated the effectiveness of this design.

computational linguistic, dataset, penguin, (15 more...)

arXiv.org Artificial Intelligence

Oct-9-2023

arXiv.org PDF

Add feedback

Country:
- North America
  - United States
    - Washington > King County
      - Seattle (0.04)
    - New York > New York County
      - New York City (0.04)
    - Indiana > Howard County
      - Greentown (0.04)
    - California > San Diego County
      - San Diego (0.04)
  - Canada > Ontario
    - Toronto (0.04)
- Europe > Spain
  - Catalonia > Barcelona Province > Barcelona (0.04)
- Asia > China
  - Beijing > Beijing (0.04)
  - Zhejiang Province > Hangzhou (0.04)
  - Hong Kong (0.04)
  - Chongqing Province > Chongqing (0.04)
  - Guangdong Province
    - Shenzhen (0.05)
    - Guangzhou (0.04)

Genre:
- Research Report (0.82)

Industry:
- Leisure & Entertainment (0.47)
- Education > Assessment & Standards
  - Student Performance (0.62)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Natural Language
    - Chatbot (0.49)
    - Large Language Model (0.47)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found