Zhang, Wei-Nan (Harbin Institute of Technology) | Ming, Zhao-Yan (Digipen Institute of Technology) | Zhang, Yu (Harbin Institute of Technology) | Liu, Ting (Harbin Institute of Technology) | Chua, Tat-Seng (National University of Singapore)
Question retrieval in current community-based question answering (CQA) services does not, in general, work well for long and complex queries. One of the main difficulties lies in the word mismatch between queries and candidate questions. Existing solutions try to expand the queries at word level, but they usually fail to consider concept level enrichment. In this paper, we explore a pivot language translation based approach to derive the paraphrases of key concepts. We further propose a unified question retrieval model which integrates the keyconcepts and their paraphrases for the query question. Experimental results demonstrate that the paraphrase enhanced retrieval model significantly outperforms the state-of-the-art models in question retrieval.
The Q&A community has become an important way for people to access knowledge and information from the Internet. However, the existing translation based on models does not consider the query specific semantics when assigning weights to query terms in question retrieval. So we improve the term weighting model based on the traditional topic translation model and further considering the quality characteristics of question and answer pairs, this paper proposes a communitybased question retrieval method that combines question and answer on quality and question relevance (T2LM+). We have also proposed a question retrieval method based on convolutional neural networks. The results show that Compared with the relatively advanced methods, the two methods proposed in this paper increase MAP by 4.91% and 6.31%.
This article investigates multilingual evidence retrieval and claim verification as a step to combat global disinformation, a first effort of this kind, to the best of our knowledge. A 400 example mixed language English-Romanian dataset is created for cross-lingual transfer learning evaluation. We make code, datasets, and trained models available upon publication.
Search systems control the exposure of ranked content to searchers. In many cases, creators value not only the exposure of their content but, moreover, an understanding of the specific searches where the content is surfaced. The problem of identifying which queries expose a given piece of content in the ranking results is an important and relatively under-explored search transparency challenge. Exposing queries are useful for quantifying various issues of search bias, privacy, data protection, security, and search engine optimization. Exact identification of exposing queries in a given system is computationally expensive, especially in dynamic contexts such as web search. In quest of a more lightweight solution, we explore the feasibility of approximate exposing query identification (EQI) as a retrieval task by reversing the role of queries and documents in two classes of search systems: dense dual-encoder models and traditional BM25 models. We then propose how this approach can be improved through metric learning over the retrieval embedding space. We further derive an evaluation metric to measure the quality of a ranking of exposing queries, as well as conducting an empirical analysis focusing on various practical aspects of approximate EQI.
Neural Networks trained with gradient descent are known to be susceptible to catastrophic forgetting caused by parameter shift during the training process. In the context of Neural Machine Translation (NMT) this results in poor performance on heterogeneous datasets and on sub-tasks like rare phrase translation. On the other hand, non-parametric approaches are immune to forgetting, perfectly complementing the generalization ability of NMT. However, attempts to combine non-parametric or retrieval based approaches with NMT have only been successful on narrow domains, possibly due to over-reliance on sentence level retrieval. We propose a novel n-gram level retrieval approach that relies on local phrase level similarities, allowing us to retrieve neighbors that are useful for translation even when overall sentence similarity is low. We complement this with an expressive neural network, allowing our model to extract information from the noisy retrieved context. We evaluate our semi-parametric NMT approach on a heterogeneous dataset composed of WMT, IWSLT, JRC-Acquis and OpenSubtitles, and demonstrate gains on all 4 evaluation sets. The semi-parametric nature of our approach opens the door for non-parametric domain adaptation, demonstrating strong inference-time adaptation performance on new domains without the need for any parameter updates.