Jones, Gareth
QAScore -- An Unsupervised Unreferenced Metric for the Question Generation Evaluation
Ji, Tianbo, Lyu, Chenyang, Jones, Gareth, Zhou, Liting, Graham, Yvette
Question Generation (QG) aims to automate the task of composing questions for a passage with a set of chosen answers found within the passage. In recent years, the introduction of neural generation models has resulted in substantial improvements of automatically generated questions in terms of quality, especially compared to traditional approaches that employ manually crafted heuristics. However, the metrics commonly applied in QG evaluations have been criticized for their low agreement with human judgement. We therefore propose a new reference-free evaluation metric that has the potential to provide a better mechanism for evaluating QG systems, called QAScore. Instead of fine-tuning a language model to maximize its correlation with human judgements, QAScore evaluates a question by computing the cross entropy according to the probability that the language model can correctly generate the masked words in the answer to that question. Furthermore, we conduct a new crowd-sourcing human evaluation experiment for the QG evaluation to investigate how QAScore and other metrics can correlate with human judgements. Experiments show that QAScore obtains a stronger correlation with the results of our proposed human evaluation method compared to existing traditional word-overlap-based metrics such as BLEU and ROUGE, as well as the existing pretrained-model-based metric BERTScore.
AlphaMWE: Construction of Multilingual Parallel Corpora with MWE Annotations
Han, Lifeng, Jones, Gareth, Smeaton, Alan
In this work, we present the construction of multilingual parallel corpora with annotation of multiword expressions (MWEs). MWEs include verbal MWEs (vMWEs) defined in the PARSEME shared task that have a verb as the head of the studied terms. The annotated vMWEs are also bilingually and multilingually aligned manually. The languages covered include English, Chinese, Polish, and German. Our original English corpus is taken from the PARSEME shared task in 2018. We performed machine translation of this source corpus followed by human post editing and annotation of target MWEs. Strict quality control was applied for error limitation, i.e., each MT output sentence received first manual post editing and annotation plus second manual quality rechecking. One of our findings during corpora preparation is that accurate translation of MWEs presents challenges to MT systems. To facilitate further MT research, we present a categorisation of the error types encountered by MT systems in performing MWE related translation. To acquire a broader view of MT issues, we selected four popular state-of-the-art MT models for comparisons namely: Microsoft Bing Translator, GoogleMT, Baidu Fanyi and DeepL MT. Because of the noise removal, translation post editing and MWE annotation by human professionals, we believe our AlphaMWE dataset will be an asset for cross-lingual and multilingual research, such as MT and information extraction. Our multilingual corpora are available as open access at github.com/poethan/AlphaMWE.
2003 AAAI Spring Symposium Series
Abecker, Andreas, Antonsson, Erik K., Callaway, Charles B., Dignum, Virginia, Doherty, Patrick, Elst, Ludger van, Freed, Michael, Freedman, Reva, Guesgen, Hans, Jones, Gareth, Koza, John, Kortenkamp, David, Maybury, Mark, McCarthy, John, Mitra, Debasis, Renz, Jochen, Schreckenghost, Debra, Williams, Mary-Anne
The Association for the Advancement of Artificial Intelligence, in cooperation with Stanford University's Department of Computer Science, presented the 2003 Spring Symposium Series, Monday through Wednesday, 24-26 March 2003, at Stanford University. The titles of the eight symposia were Agent-Mediated Knowledge Management, Computational Synthesis: From Basic Building Blocks to High- Level Functions, Foundations and Applications of Spatiotemporal Reasoning (FASTR), Human Interaction with Autonomous Systems in Complex Environments, Intelligent Multimedia Knowledge Management, Logical Formalization of Commonsense Reasoning, Natural Language Generation in Spoken and Written Dialogue, and New Directions in Question-Answering Motivation.
2003 AAAI Spring Symposium Series
Abecker, Andreas, Antonsson, Erik K., Callaway, Charles B., Dignum, Virginia, Doherty, Patrick, Elst, Ludger van, Freed, Michael, Freedman, Reva, Guesgen, Hans, Jones, Gareth, Koza, John, Kortenkamp, David, Maybury, Mark, McCarthy, John, Mitra, Debasis, Renz, Jochen, Schreckenghost, Debra, Williams, Mary-Anne
The Association for the Advancement of Artificial Intelligence, in cooperation with Stanford University's Department of Computer Science, presented the 2003 Spring Symposium Series, Monday through Wednesday, 24-26 March 2003, at Stanford University. The titles of the eight symposia were Agent-Mediated Knowledge Management, Computational Synthesis: From Basic Building Blocks to High- Level Functions, Foundations and Applications of Spatiotemporal Reasoning (FASTR), Human Interaction with Autonomous Systems in Complex Environments, Intelligent Multimedia Knowledge Management, Logical Formalization of Commonsense Reasoning, Natural Language Generation in Spoken and Written Dialogue, and New Directions in Question-Answering Motivation.