He, Shizhu
Mastering Symbolic Operations: Augmenting Language Models with Compiled Neural Networks
Weng, Yixuan, Zhu, Minjun, Xia, Fei, Li, Bin, He, Shizhu, Liu, Kang, Zhao, Jun
Language models (LMs) proficiency in handling deterministic symbolic reasoning and rule-based tasks remains limited due to their dependency implicit learning on textual data. To enable fully rule comprehension ability, we explore how to incorporate compiled neural networks (CoNNs) which weight is specially designed into the architecture of LMs, to achieve high accuracy and robust performance. CoNNs are transformer-based neural networks that execute rules through artificially generated attention weights. Our method, which call "Neural Comprehension", by incorporating CoNN modules into the LM, the framework effectively tackles rule-intensive challenges. Our experiments on symbolic reasoning tasks and real-world arithmetic reasoning tasks demonstrate the superior performance of our method compared to existing techniques. Furthermore, our LM achieves flawless execution on symbolic operations tasks, highlighting the potential of our method in enabling LMs to possess true symbolic comprehension capabilities. Our code is publicly available at: https://github.com/WENGSYX/Neural-Comprehension.
Large Language Models Need Holistically Thought in Medical Conversational QA
Weng, Yixuan, Li, Bin, Xia, Fei, Zhu, Minjun, Sun, Bin, He, Shizhu, Liu, Kang, Zhao, Jun
The medical conversational question answering (CQA) system aims at providing a series of professional medical services to improve the efficiency of medical care. Despite the success of large language models (LLMs) in complex reasoning tasks in various fields, such as mathematics, logic, and commonsense QA, they still need to improve with the increased complexity and specialization of the medical field. This is because medical CQA tasks require not only strong medical reasoning, but also the ability to think broadly and deeply. In this paper, to address these challenges in medical CQA tasks that need to be considered and understood in many aspects, we propose the Holistically Thought (HoT) method, which is designed to guide the LLMs to perform the diffused and focused thinking for generating high-quality medical responses. The proposed HoT method has been evaluated through automated and manual assessments in three different medical CQA datasets containing the English and Chinese languages. The extensive experimental results show that our method can produce more correctness, professional, and considerate answers than several state-of-the-art (SOTA) methods, manifesting its effectiveness. Our code in https://github.com/WENGSYX/HoT.
Knowledge Reasoning via Jointly Modeling Knowledge Graphs and Soft Rules
Lan, Yinyu, He, Shizhu, Liu, Kang, Zhao, Jun
Knowledge graphs (KGs) play a crucial role in many applications, such as question answering, but incompleteness is an urgent issue for their broad application. Much research in knowledge graph completion (KGC) has been performed to resolve this issue. The methods of KGC can be classified into two major categories: rule-based reasoning and embedding-based reasoning. The former has high accuracy and good interpretability, but a major challenge is to obtain effective rules on large-scale KGs. The latter has good efficiency and scalability, but it relies heavily on data richness and cannot fully use domain knowledge in the form of logical rules. We propose a novel method that injects rules and learns representations iteratively to take full advantage of rules and embeddings. Specifically, we model the conclusions of rule groundings as 0-1 variables and use a rule confidence regularizer to remove the uncertainty of the conclusions. The proposed approach has the following advantages: 1) It combines the benefits of both rules and knowledge graph embeddings (KGEs) and achieves a good balance between efficiency and scalability. 2) It uses an iterative method to continuously improve KGEs and remove incorrect rule conclusions. Evaluations on two public datasets show that our method outperforms the current state-of-the-art methods, improving performance by 2.7\% and 4.3\% in mean reciprocal rank (MRR).
ReasonChainQA: Text-based Complex Question Answering with Explainable Evidence Chains
Zhu, Minjun, Weng, Yixuan, He, Shizhu, Liu, Kang, Zhao, Jun
The ability of reasoning over evidence has received increasing attention in question answering (QA). Recently, natural language database (NLDB) conducts complex QA in knowledge base with textual evidences rather than structured representations, this task attracts a lot of attention because of the flexibility and richness of textual evidence. However, existing text-based complex question answering datasets fail to provide explicit reasoning process, while it's important for retrieval effectiveness and reasoning interpretability. Therefore, we present a benchmark \textbf{ReasonChainQA} with explanatory and explicit evidence chains. ReasonChainQA consists of two subtasks: answer generation and evidence chains extraction, it also contains higher diversity for multi-hop questions with varying depths, 12 reasoning types and 78 relations. To obtain high-quality textual evidences for answering complex question. Additional experiment on supervised and unsupervised retrieval fully indicates the significance of ReasonChainQA. Dataset and codes will be made publicly available upon accepted.
ADBCMM : Acronym Disambiguation by Building Counterfactuals and Multilingual Mixing
Weng, Yixuan, Xia, Fei, Li, Bin, Huang, Xiusheng, He, Shizhu, Liu, Kang, Zhao, Jun
Scientific documents often contain a large number of acronyms. Disambiguation of these acronyms will help researchers better understand the meaning of vocabulary in the documents. In the past, thanks to large amounts of data from English literature, acronym task was mainly applied in English literature. However, for other low-resource languages, this task is difficult to obtain good performance and receives less attention due to the lack of large amount of annotation data. To address the above issue, this paper proposes an new method for acronym disambiguation, named as ADBCMM, which can significantly improve the performance of low-resource languages by building counterfactuals and multilingual mixing. Specifically, by balancing data bias in low-resource langauge, ADBCMM will able to improve the test performance outside the data set. In SDU@AAAI-22 - Shared Task 2: Acronym Disambiguation, the proposed method won first place in French and Spanish. You can repeat our results here https://github.com/WENGSYX/ADBCMM.
Path-based knowledge reasoning with textual semantic information for medical knowledge graph completion
Lan, Yinyu, He, Shizhu, Zeng, Xiangrong, Liu, Shengping, Liu, Kang, Zhao, Jun
Background Knowledge graphs (KGs), especially medical knowledge graphs, are often significantly incomplete, so it necessitating a demand for medical knowledge graph completion (MedKGC). MedKGC can find new facts based on the exited knowledge in the KGs. The path-based knowledge reasoning algorithm is one of the most important approaches to this task. This type of method has received great attention in recent years because of its high performance and interpretability. In fact, traditional methods such as path ranking algorithm (PRA) take the paths between an entity pair as atomic features. However, the medical KGs are very sparse, which makes it difficult to model effective semantic representation for extremely sparse path features. The sparsity in the medical KGs is mainly reflected in the long-tailed distribution of entities and paths. Previous methods merely consider the context structure in the paths of the knowledge graph and ignore the textual semantics of the symbols in the path. Therefore, their performance cannot be further improved due to the two aspects of entity sparseness and path sparseness. To address the above issues, this paper proposes two novel path-based reasoning methods to solve the sparsity issues of entity and path respectively, which adopts the textual semantic information of entities and paths for MedKGC. By using the pre-trained model BERT, combining the textual semantic representations of the entities and the relationships, we model the task of symbolic reasoning in the medical KG as a numerical computing issue in textual semantic representation.
Large Scaled Relation Extraction With Reinforcement Learning
Zeng, Xiangrong (Institute of Automation, Chinese Academy of Sciences) | He, Shizhu (Institute of Automation, Chinese Academy of Sciences) | Liu, Kang (Institute of Automation, Chinese Academy of Sciences) | Zhao, Jun (Institute of Automation, Chinese Academy of Sciences)
Sentence relation extraction aims to extract relational facts from sentences, which is an important task in natural language processing field. Previous models rely on the manually labeled supervised dataset. However, the human annotation is costly and limits to the number of relation and data size, which is difficult to scale to large domains. In order to conduct largely scaled relation extraction, we utilize an existing knowledge base to heuristically align with texts, which not rely on human annotation and easy to scale. However, using distant supervised data for relation extraction is facing a new challenge: sentences in the distant supervised dataset are not directly labeled and not all sentences that mentioned an entity pair can represent the relation between them. To solve this problem, we propose a novel model with reinforcement learning. The relation of the entity pair is used as distant supervision and guide the training of relation extractor with the help of reinforcement learning method. We conduct two types of experiments on a publicly released dataset. Experiment results demonstrate the effectiveness of the proposed method compared with baseline models, which achieves 13.36\% improvement.
Distant Supervision for Relation Extraction with Sentence-Level Attention and Entity Descriptions
Ji, Guoliang (National Laboratory of Pattern Recognition (NLPR), Institute of Automation Chinese Academy of Sciences) | Liu, Kang (National Laboratory of Pattern Recognition (NLPR), Institute of Automation Chinese Academy of Sciences) | He, Shizhu (National Laboratory of Pattern Recognition (NLPR), Institute of Automation Chinese Academy of Sciences) | Zhao, Jun (National Laboratory of Pattern Recognition (NLPR), Institute of Automation Chinese Academy of Sciences)
Distant supervision for relation extraction is an efficient method to scale relation extraction to very large corpora which contains thousands of relations. However, the existing approaches have flaws on selecting valid instances and lack of background knowledge about the entities. In this paper, we propose a sentence-level attention model to select the valid instances, which makes full use of the supervision information from knowledge bases. And we extract entity descriptions from Freebase and Wikipedia pages to supplement background knowledge for our task. The background knowledge not only provides more information for predicting relations, but also brings better entity representations for the attention module. We conduct three experiments on a widely used dataset and the experimental results show that our approach outperforms all the baseline systems significantly.
Knowledge Graph Completion with Adaptive Sparse Transfer Matrix
Ji, Guoliang (Institute of Automation, Chinese Academy of Sciences) | Liu, Kang (Institute of Automation, Chinese Academy of Sciences) | He, Shizhu (Institute of Automation, Chinese Academy of Sciences) | Zhao, Jun (Institute of Automation, Chinese Academy of Sciences)
We model knowledge graphs for their completion by encoding each entity and relation into a numerical space. All previous work including Trans(E, H, R, and D) ignore the heterogeneity (some relations link many entity pairs and others do not) and the imbalance (the number of head entities and that of tail entities in a relation could be different) of knowledge graphs. In this paper, we propose a novel approach TranSparse to deal with the two issues. In TranSparse, transfer matrices are replaced by adaptive sparse matrices, whose sparse degrees are determined by the number of entities (or entity pairs) linked by relations. In experiments, we design structured and unstructured sparse patterns for transfer matrices and analyze their advantages and disadvantages. We evaluate our approach on triplet classification and link prediction tasks. Experimental results show that TranSparse outperforms Trans(E, H, R, and D) significantly, and achieves state-of-the-art performance.
A Joint Model for Question Answering over Multiple Knowledge Bases
Zhang, Yuanzhe (Institute of Automation, Chinese Academy of Sciences) | He, Shizhu (Institute of Automation, Chinese Academy of Sciences) | Liu, Kang (Institute of Automation, Chinese Academy of Sciences) | Zhao, Jun (Institute of Automation, Chinese Academy of Sciences)
As the amount of knowledge bases (KBs) grows rapidly, the problem of question answering (QA) over multiple KBs has drawn more attention. The most significant distinction between multiple KB-QA and single KB-QA is that the former must consider the alignments between KBs. The pipeline strategy first constructs the alignments independently, and then uses the obtained alignments to construct queries. However, alignment construction is not a trivial task, and the introduced noises would be passed on to query construction. By contrast, we notice that alignment construction and query construction are interactive steps, and jointly considering them would be beneficial. To this end, we present a novel joint model based on integer linear programming (ILP), uniting these two procedures into a uniform framework. The experimental results demonstrate that the proposed approach outperforms state-of-the-art systems, and is able to improve the performance of both alignment construction and query construction.