conversational intelligence challenge
Conversational Intelligence Challenge: Accelerating Research with Crowd Science and Open Source
Development of conversational systems is one of the most challenging tasks in natural language processing, and it is especially hard in the case of open-domain dialogue. The main factors that hinder progress in this area are lack of training data and difficulty of automatic evaluation. Thus, to reliably evaluate the quality of such models, one needs to resort to time-consuming and expensive human evaluation. We tackle these problems by organizing the Conversational Intelligence Challenge (ConvAI) -- open competition of dialogue systems. Our goals are threefold: to work out a good design for human evaluation of open-domain dialogue, to grow open-source code base for conversational systems, and to harvest and publish new datasets.
The Second Conversational Intelligence Challenge (ConvAI2)
Dinan, Emily, Logacheva, Varvara, Malykh, Valentin, Miller, Alexander, Shuster, Kurt, Urbanek, Jack, Kiela, Douwe, Szlam, Arthur, Serban, Iulian, Lowe, Ryan, Prabhumoye, Shrimai, Black, Alan W, Rudnicky, Alexander, Williams, Jason, Pineau, Joelle, Burtsev, Mikhail, Weston, Jason
We describe the setting and results of the ConvAI2 NeurIPS competition that aims to further the state-of-the-art in open-domain chatbots. Some key takeaways from the competition are: (i) pretrained Transformer variants are currently the best performing models on this task, (ii) but to improve performance on multi-turn conversations with humans, future systems must go beyond single word metrics like perplexity to measure the performance across sequences of utterances (conversations) in terms of repetition, consistency and balance of dialogue acts (e.g. The Conversational Intelligence Challenge aims at finding approaches to creating highquality dialogue agents capable of meaningful open domain conversation. Today, the progress in the field is significantly hampered by the absence of established benchmark tasks for non-goal-oriented dialogue systems (chatbots) and solid evaluation criteria for automatic assessment of dialogue quality. The aim of this competition was therefore to establish a concrete scenario for testing chatbots that aim to engage humans, and become a standard evaluation tool in order to make such systems directly comparable, including open source datasets, evaluation code (both automatic evaluations and code to run the human evaluation on Mechanical Turk), model baselines and the winning model itself. Taking into account the results of the previous edition, this year we improved the task, the evaluation process, and the human conversationalists' experience. We did this in part by making the setup simpler for the competitors, and in part by making the conversations more engaging for humans. We provided a dataset from the beginning, Persona-Chat, whose training set consists of conversations between crowdworkers who were randomly paired and asked to act the part of a given provided persona (randomly assigned, and created by another set of crowdworkers). The paired workers were asked to chat naturally and to get to know each other during the conversation. This produces interesting and engaging conversations that learning agents can try to mimic.
- North America > Canada > Quebec > Montreal (0.14)
- North America > United States > Hawaii (0.04)
- North America > United States > California (0.04)
- (5 more...)
- Research Report (1.00)
- Contests & Prizes (0.68)