rewrite sentence
Pre-training Cross-lingual Open Domain Question Answering with Large-scale Synthetic Supervision
Jiang, Fan, Drummond, Tom, Cohn, Trevor
Cross-lingual open domain question answering (CLQA) is a complex problem, comprising cross-lingual retrieval from a multilingual knowledge base, followed by answer generation in the query language. Both steps are usually tackled by separate models, requiring substantial annotated datasets, and typically auxiliary resources, like machine translation systems to bridge between languages. In this paper, we show that CLQA can be addressed using a single encoder-decoder model. To effectively train this model, we propose a self-supervised method based on exploiting the cross-lingual link structure within Wikipedia. We demonstrate how linked Wikipedia pages can be used to synthesise supervisory signals for cross-lingual retrieval, through a form of cloze query, and generate more natural questions to supervise answer generation. Together, we show our approach, \texttt{CLASS}, outperforms comparable methods on both supervised and zero-shot language adaptation settings, including those using machine translation.
Microsoft's AI rewrites sentences based on context
Ever heard of context modeling? It defines how contextual data is structured and maintained, and it plays a pivotal role in open domain conversation. That's why researchers at Microsoft recently investigated a novel approach that involves rewriting the last utterance in a dialogue turn (i.e., a series of utterances) by considering context history. In a preprint paper detailing their work ("Unsupervised Context Rewriting for Open Domain Conversation"), they claim empirical results show it achieves state-of-the-art baselines in terms of rewriting quality and multi-turn response generation. As the researchers explain, conversation context raises challenges not existing in the sentence modeling, including things like topic transitions, coreferences (e.g., he, him, she, it, they), and long-term dependencies.