Wei, Zhongyu
Hierarchical Reinforcement Learning for Automatic Disease Diagnosis
Zhong, Cheng, Liao, Kangenbei, Chen, Wei, Liu, Qianlong, Peng, Baolin, Huang, Xuanjing, Peng, Jiajie, Wei, Zhongyu
Motivation: Disease diagnosis oriented dialogue system models the interactive consultation procedure as Markov Decision Process and reinforcement learning algorithms are used to solve the problem. Existing approaches usually employ a flat policy structure that treat all symptoms and diseases equally for action making. This strategy works well in the simple scenario when the action space is small, however, its efficiency will be challenged in the real environment. Inspired by the offline consultation process, we propose to integrate a hierarchical policy structure of two levels into the dialogue systemfor policy learning. The high-level policy consists of amastermodel that is responsible for triggering a low-levelmodel, the lowlevel policy consists of several symptom checkers and a disease classifier. The proposed policy structure is capable to deal with diagnosis problem including large number of diseases and symptoms. Results: Experimental results on three real-world datasets and a synthetic dataset demonstrate that our hierarchical framework achieves higher accuracy and symptom recall in disease diagnosis compared with existing systems. We construct a benchmark including datasets and implementation of existing algorithms to encourage follow-up researches. Availability: The code and data is available from https://github.com/FudanDISC/DISCOpen-MedBox-DialoDiagnosis Contact: 21210980124@m.fudan.edu.cn Supplementary information: Supplementary data are available at Bioinformatics online.
DISC-FinLLM: A Chinese Financial Large Language Model based on Multiple Experts Fine-tuning
Chen, Wei, Wang, Qiushi, Long, Zefei, Zhang, Xianyin, Lu, Zhongtian, Li, Bingxuan, Wang, Siyuan, Xu, Jiarong, Bai, Xiang, Huang, Xuanjing, Wei, Zhongyu
The financial industry presents unique challenges and opportunities for Natural Language Processing In this paper, we propose a comprehensive approach (NLP) models (Huang et al., 2020). Traditional to build Chinese financial LLMs and present financial NLP models have made progress DISC-FinLLM. Our method aims to enhance general in various tasks such as news sentiment analysis LLMs by equipping them with the skills to (Araci, 2019), financial event extraction (Zheng address typical needs for financial text generation et al., 2019; Yang et al., 2019), financial report and understanding, meaningful multi-turn conversations generation (Chapman et al., 2022), stock price prediction on financial topics, and plugin functionality (Chen et al., 2018) and financial text summarization to support financial modeling and knowledgeenhanced (La Quatra and Cagliero, 2020).
Valley: Video Assistant with Large Language model Enhanced abilitY
Luo, Ruipu, Zhao, Ziwang, Yang, Min, Dong, Junwei, Li, Da, Lu, Pengcheng, Wang, Tao, Hu, Linmei, Qiu, Minghui, Wei, Zhongyu
Large language models (LLMs), with their remarkable conversational capabilities, have demonstrated impressive performance across various applications and have emerged as formidable AI assistants. In view of this, it raises an intuitive question: Can we harness the power of LLMs to build multimodal AI assistants for visual applications? Recently, several multi-modal models have been developed for this purpose. They typically pre-train an adaptation module to align the semantics of the vision encoder and language model, followed by fine-tuning on instruction-following data. However, despite the success of this pipeline in image and language understanding, its effectiveness in joint video and language understanding has not been widely explored. In this paper, we aim to develop a novel multi-modal foundation model capable of comprehending video, image, and language within a general framework. To achieve this goal, we introduce Valley, a Video Assistant with Large Language model Enhanced abilitY. The Valley consists of a LLM, a temporal modeling module, a visual encoder, and a simple projection module designed to bridge visual and textual modes. To empower Valley with video comprehension and instruction-following capabilities, we construct a video instruction dataset and adopt a two-stage tuning procedure to train it. Specifically, we employ ChatGPT to facilitate the construction of task-oriented conversation data encompassing various tasks, including multi-shot captions, long video descriptions, action recognition, causal relationship inference, etc. Subsequently, we adopt a pre-training-then-instructions-tuned pipeline to align visual and textual modalities and improve the instruction-following capability of Valley. Qualitative experiments demonstrate that Valley has the potential to function as a highly effective video assistant that can make complex video understanding scenarios easy.
DISC-LawLLM: Fine-tuning Large Language Models for Intelligent Legal Services
Yue, Shengbin, Chen, Wei, Wang, Siyuan, Li, Bingxuan, Shen, Chenchen, Liu, Shujun, Zhou, Yuxuan, Xiao, Yao, Yun, Song, Huang, Xuanjing, Wei, Zhongyu
We propose DISC-LawLLM, an intelligent legal system utilizing large language models (LLMs) to provide a wide range of legal services. We adopt legal syllogism prompting strategies to construct supervised fine-tuning datasets in the Chinese Judicial domain and fine-tune LLMs with legal reasoning capability. We augment LLMs with a retrieval module to enhance models' ability to access and utilize external legal knowledge. A comprehensive legal benchmark, DISC-Law-Eval, is presented to evaluate intelligent legal systems from both objective and subjective dimensions. Quantitative and qualitative results on DISC-Law-Eval demonstrate the effectiveness of our system in serving various users across diverse legal scenarios. The detailed resources are available at https://github.com/FudanDISC/DISC-LawLLM.
Breaking Down the Task: A Unit-Grained Hybrid Training Framework for Vision and Language Decision Making
Luo, Ruipu, Zhang, Jiwen, Wei, Zhongyu
Vision language decision making (VLDM) is a challenging multimodal task. The agent have to understand complex human instructions and complete compositional tasks involving environment navigation and object manipulation. However, the long action sequences involved in VLDM make the task difficult to learn. From an environment perspective, we find that task episodes can be divided into fine-grained \textit{units}, each containing a navigation phase and an interaction phase. Since the environment within a unit stays unchanged, we propose a novel hybrid-training framework that enables active exploration in the environment and reduces the exposure bias. Such framework leverages the unit-grained configurations and is model-agnostic. Specifically, we design a Unit-Transformer (UT) with an intrinsic recurrent state that maintains a unit-scale cross-modal memory. Through extensive experiments on the TEACH benchmark, we demonstrate that our proposed framework outperforms existing state-of-the-art methods in terms of all evaluation metrics. Overall, our work introduces a novel approach to tackling the VLDM task by breaking it down into smaller, manageable units and utilizing a hybrid-training framework. By doing so, we provide a more flexible and effective solution for multimodal decision making.
Unifying Structure Reasoning and Language Model Pre-training for Complex Reasoning
Wang, Siyuan, Wei, Zhongyu, Xu, Jiarong, Li, Taishan, Fan, Zhihao
Recent pre-trained language models (PLMs) equipped with foundation reasoning skills have shown remarkable performance on downstream complex tasks. However, the significant structure reasoning skill has been rarely studied, which involves modeling implicit structure information within the text and performing explicit logical reasoning over them to deduce the conclusion. This paper proposes a unified learning framework that combines explicit structure reasoning and language pre-training to endow PLMs with the structure reasoning skill. It first identifies several elementary structures within contexts to construct structured queries and performs step-by-step reasoning along the queries to identify the answer entity. The fusion of textual semantics and structure reasoning is achieved by using contextual representations learned by PLMs to initialize the representation space of structures, and performing stepwise reasoning on this semantic representation space. Experimental results on four datasets demonstrate that the proposed model achieves significant improvements in complex reasoning tasks involving diverse structures, and shows transferability to downstream tasks with limited training data and effectiveness for complex reasoning of KGs modality.
DSRM: Boost Textual Adversarial Training with Distribution Shift Risk Minimization
Gao, Songyang, Dou, Shihan, Liu, Yan, Wang, Xiao, Zhang, Qi, Wei, Zhongyu, Ma, Jin, Shan, Ying
Adversarial training is one of the best-performing methods in improving the robustness of deep language models. However, robust models come at the cost of high time consumption, as they require multi-step gradient ascents or word substitutions to obtain adversarial samples. In addition, these generated samples are deficient in grammatical quality and semantic consistency, which impairs the effectiveness of adversarial training. To address these problems, we introduce a novel, effective procedure for instead adversarial training with only clean data. Our procedure, distribution shift risk minimization (DSRM), estimates the adversarial loss by perturbing the input data's probability distribution rather than their embeddings. This formulation results in a robust model that minimizes the expected global loss under adversarial attacks. Our approach requires zero adversarial samples for training and reduces time consumption by up to 70\% compared to current best-performing adversarial training methods. Experiments demonstrate that DSRM considerably improves BERT's resistance to textual adversarial attacks and achieves state-of-the-art robust accuracy on various benchmarks.
Open Set Relation Extraction via Unknown-Aware Training
Zhao, Jun, Zhao, Xin, Zhan, Wenyu, Zhang, Qi, Gui, Tao, Wei, Zhongyu, Chen, Yunwen, Gao, Xiang, Huang, Xuanjing
The existing supervised relation extraction methods have achieved impressive performance in a closed-set setting, where the relations during both training and testing remain the same. In a more realistic open-set setting, unknown relations may appear in the test set. Due to the lack of supervision signals from unknown relations, a well-performing closed-set relation extractor can still confidently misclassify them into known relations. In this paper, we propose an unknown-aware training method, regularizing the model by dynamically synthesizing negative instances. To facilitate a compact decision boundary, ``difficult'' negative instances are necessary. Inspired by text adversarial attacks, we adaptively apply small but critical perturbations to original training instances and thus synthesizing negative instances that are more likely to be mistaken by the model as known relations. Experimental results show that this method achieves SOTA unknown relation detection without compromising the classification of known relations.
RE-Matching: A Fine-Grained Semantic Matching Method for Zero-Shot Relation Extraction
Zhao, Jun, Zhan, Wenyu, Zhao, Xin, Zhang, Qi, Gui, Tao, Wei, Zhongyu, Wang, Junzhe, Peng, Minlong, Sun, Mingming
Semantic matching is a mainstream paradigm of zero-shot relation extraction, which matches a given input with a corresponding label description. The entities in the input should exactly match their hypernyms in the description, while the irrelevant contexts should be ignored when matching. However, general matching methods lack explicit modeling of the above matching pattern. In this work, we propose a fine-grained semantic matching method tailored for zero-shot relation extraction. Following the above matching pattern, we decompose the sentence-level similarity score into entity and context matching scores. Due to the lack of explicit annotations of the redundant components, we design a feature distillation module to adaptively identify the relation-irrelevant features and reduce their negative impact on context matching. Experimental results show that our method achieves higher matching $F_1$ score and has an inference speed 10 times faster, when compared with the state-of-the-art methods.
Actively Supervised Clustering for Open Relation Extraction
Zhao, Jun, Zhang, Yongxin, Zhang, Qi, Gui, Tao, Wei, Zhongyu, Peng, Minlong, Sun, Mingming
Current clustering-based Open Relation Extraction (OpenRE) methods usually adopt a two-stage pipeline. The first stage simultaneously learns relation representations and assignments. The second stage manually labels several instances and thus names the relation for each cluster. However, unsupervised objectives struggle to optimize the model to derive accurate clustering assignments, and the number of clusters has to be supplied in advance. In this paper, we present a novel setting, named actively supervised clustering for OpenRE. Our insight lies in that clustering learning and relation labeling can be alternately performed, providing the necessary guidance for clustering without a significant increase in human effort. The key to the setting is selecting which instances to label. Instead of using classical active labeling strategies designed for fixed known classes, we propose a new strategy, which is applicable to dynamically discover clusters of unknown relations. Experimental results show that our method is able to discover almost all relational clusters in the data and improve the SOTA methods by 10.3\% and 5.2\%, on two datasets respectively.