EVA: An Open-Domain Chinese Dialogue System with Large-Scale Generative Pre-Training

Zhou, Hao, Ke, Pei, Zhang, Zheng, Gu, Yuxian, Zheng, Yinhe, Zheng, Chujie, Wang, Yida, Wu, Chen Henry, Sun, Hao, Yang, Xiaocong, Wen, Bosi, Zhu, Xiaoyan, Huang, Minlie, Tang, Jie

Aug-3-2021–arXiv.org Artificial Intelligence

Although pre-trained language models have remarkably enhanced the generation ability of dialogue systems, open-domain Chinese dialogue systems are still limited by the dialogue data and the model size compared with English ones. In this paper, we propose EVA, a Chinese dialogue system that contains the largest Chinese pre-trained dialogue model with 2.8B parameters. To build this model, we collect the largest Chinese dialogue dataset named WDC-Dialogue from various public social media. This dataset contains 1.4B context-response pairs and is used as the pre-training corpus of EVA. Extensive experiments on automatic and human evaluation show that EVA outperforms other Chinese pre-trained dialogue models especially in the multi-turn interaction of human-bot conversations.

artificial intelligence, dialogue, natural language, (19 more...)

arXiv.org Artificial Intelligence

Aug-3-2021

arXiv.org PDF

Add feedback

Country:
- North America
  - Canada > Quebec (0.14)
  - United States
    - California (0.14)
    - Pennsylvania (0.14)

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks (0.47)
  - Natural Language
    - Chatbot (0.47)
    - Large Language Model (0.47)