Wang, Sirui
DABERT: Dual Attention Enhanced BERT for Semantic Matching
Wang, Sirui, Liang, Di, Song, Jian, Li, Yuntao, Wu, Wei
Transformer-based pre-trained language models such as BERT have achieved remarkable results in Semantic Sentence Matching. However, existing models still suffer from insufficient ability to capture subtle differences. Minor noise like word addition, deletion, and modification of sentences may cause flipped predictions. To alleviate this problem, we propose a novel Dual Attention Enhanced BERT (DABERT) to enhance the ability of BERT to capture fine-grained differences in sentence pairs. DABERT comprises (1) Dual Attention module, which measures soft word matches by introducing a new dual channel alignment mechanism to model affinity and difference attention. (2) Adaptive Fusion module, this module uses attention to learn the aggregation of difference and affinity features, and generates a vector describing the matching details of sentence pairs. We conduct extensive experiments on well-studied semantic matching and robustness test datasets, and the experimental results show the effectiveness of our proposed method.
Dual Path Modeling for Semantic Matching by Perceiving Subtle Conflicts
Xue, Chao, Liang, Di, Wang, Sirui, Wu, Wei, Zhang, Jing
Transformer-based pre-trained models have achieved great improvements in semantic matching. However, existing models still suffer from insufficient ability to capture subtle differences. The modification, addition and deletion of words in sentence pairs may make it difficult for the model to predict their relationship. To alleviate this problem, we propose a novel Dual Path Modeling Framework to enhance the model's ability to perceive subtle differences in sentence pairs by separately modeling affinity and difference semantics. Based on dual-path modeling framework we design the Dual Path Modeling Network (DPM-Net) to recognize semantic relations. And we conduct extensive experiments on 10 well-studied semantic matching and robustness test datasets, and the experimental results show that our proposed method achieves consistent improvements over baselines.
Time-aware Multiway Adaptive Fusion Network for Temporal Knowledge Graph Question Answering
Liu, Yonghao, Liang, Di, Fang, Fang, Wang, Sirui, Wu, Wei, Jiang, Rui
Knowledge graphs (KGs) have received increasing attention due to its wide applications on natural language processing. However, its use case on temporal question answering (QA) has not been well-explored. Most of existing methods are developed based on pre-trained language models, which might not be capable to learn \emph{temporal-specific} presentations of entities in terms of temporal KGQA task. To alleviate this problem, we propose a novel \textbf{T}ime-aware \textbf{M}ultiway \textbf{A}daptive (\textbf{TMA}) fusion network. Inspired by the step-by-step reasoning behavior of humans. For each given question, TMA first extracts the relevant concepts from the KG, and then feeds them into a multiway adaptive module to produce a \emph{temporal-specific} representation of the question. This representation can be incorporated with the pre-trained KG embedding to generate the final prediction. Empirical results verify that the proposed model achieves better performance than the state-of-the-art models in the benchmark dataset. Notably, the Hits@1 and Hits@10 results of TMA on the CronQuestions dataset's complex questions are absolutely improved by 24\% and 10\% compared to the best-performing baseline. Furthermore, we also show that TMA employing an adaptive fusion mechanism can provide interpretability by analyzing the proportion of information in question representations.
Pay More Attention to History: A Context Modeling Strategy for Conversational Text-to-SQL
Li, Yuntao, Zhang, Hanchu, Li, Yutian, Wang, Sirui, Wu, Wei, Zhang, Yan
Conversational text-to-SQL aims at converting multi-turn natural language queries into their corresponding SQL representations. One of the most intractable problem of conversational text-to-SQL is modeling the semantics of multi-turn queries and gathering proper information required for the current query. This paper shows that explicit modeling the semantic changes by adding each turn and the summarization of the whole context can bring better performance on converting conversational queries into SQLs. In particular, we propose two conversational modeling tasks in both turn grain and conversation grain. These two tasks simply work as auxiliary training tasks to help with multi-turn conversational semantic parsing. We conducted empirical studies and achieve new state-of-the-art results on large-scale open-domain conversational text-to-SQL dataset. The results demonstrate that the proposed mechanism significantly improves the performance of multi-turn semantic parsing.
ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer
Yan, Yuanmeng, Li, Rumei, Wang, Sirui, Zhang, Fuzheng, Wu, Wei, Xu, Weiran
Learning high-quality sentence representations benefits a wide range of natural language processing tasks. Though BERT-based pre-trained language models achieve high performance on many downstream tasks, the native derived sentence representations are proved to be collapsed and thus produce a poor performance on the semantic textual similarity (STS) tasks. In this paper, we present ConSERT, a Contrastive Framework for Self-Supervised Sentence Representation Transfer, that adopts contrastive learning to fine-tune BERT in an unsupervised and effective way. By making use of unlabeled texts, ConSERT solves the collapse issue of BERT-derived sentence representations and make them more applicable for downstream tasks. Experiments on STS datasets demonstrate that ConSERT achieves an 8\% relative improvement over the previous state-of-the-art, even comparable to the supervised SBERT-NLI. And when further incorporating NLI supervision, we achieve new state-of-the-art performance on STS tasks. Moreover, ConSERT obtains comparable results with only 1000 samples available, showing its robustness in data scarcity scenarios.
Leveraging Historical Interaction Data for Improving Conversational Recommender System
Zhou, Kun, Zhao, Wayne Xin, Wang, Hui, Wang, Sirui, Zhang, Fuzheng, Wang, Zhongyuan, Wen, Ji-Rong
Recently, conversational recommender system (CRS) has become With the rapid development of intelligent agents in e-commerce an emerging and practical research topic. Most of the existing CRS platforms, conversational recommender system (CRS) [5, 6, 9] has methods focus on learning effective preference representations for become an emerging research topic in seeking to provide highquality users from conversation data alone. While, we take a new perspective recommendations to users through conversations. Generally, to leverage historical interaction data for improving CRS. a CRS consists of a conversation module and a recommendation For this purpose, we propose a novel pre-training approach to module. The conversation module focuses on acquiring users' preference integrating both item-based preference sequence (from historical via multi-turn interaction, and the recommendation module interaction data) and attribute-based preference sequence (from conversation focuses on how to utilize the inferred preference information to data) via pre-training methods. We carefully design two recommend suitable items for users.