Not enough data to create a plot.
Try a different view from the menu above.
Ding, Xiao
Is ChatGPT a Good Causal Reasoner? A Comprehensive Evaluation
Gao, Jinglong, Ding, Xiao, Qin, Bing, Liu, Ting
Causal reasoning ability is crucial for numerous NLP applications. Despite the impressive emerging ability of ChatGPT in various NLP tasks, it is unclear how well ChatGPT performs in causal reasoning. In this paper, we conduct the first comprehensive evaluation of the ChatGPT's causal reasoning capabilities. Experiments show that ChatGPT is not a good causal reasoner, but a good causal explainer. Besides, ChatGPT has a serious hallucination on causal reasoning, possibly due to the reporting biases between causal and non-causal relationships in natural language, as well as ChatGPT's upgrading processes, such as RLHF. The In-Context Learning (ICL) and Chain-of-Thought (CoT) techniques can further exacerbate such causal hallucination. Additionally, the causal reasoning ability of ChatGPT is sensitive to the words used to express the causal concept in prompts, and close-ended prompts perform better than open-ended prompts. For events in sentences, ChatGPT excels at capturing explicit causality rather than implicit causality, and performs better in sentences with lower event density and smaller lexical distance between events. The code is available on https://github.com/ArrogantL/ChatGPT4CausalReasoning .
ChatGPT is not Enough: Enhancing Large Language Models with Knowledge Graphs for Fact-aware Language Modeling
Yang, Linyao, Chen, Hongyang, Li, Zhao, Ding, Xiao, Wu, Xindong
Recently, ChatGPT, a representative large language model (LLM), has gained considerable attention due to its powerful emergent abilities. Some researchers suggest that LLMs could potentially replace structured knowledge bases like knowledge graphs (KGs) and function as parameterized knowledge bases. However, while LLMs are proficient at learning probabilistic language patterns based on large corpus and engaging in conversations with humans, they, like previous smaller pre-trained language models (PLMs), still have difficulty in recalling facts while generating knowledge-grounded contents. To overcome these limitations, researchers have proposed enhancing data-driven PLMs with knowledge-based KGs to incorporate explicit factual knowledge into PLMs, thus improving their performance to generate texts requiring factual knowledge and providing more informed responses to user queries. This paper reviews the studies on enhancing PLMs with KGs, detailing existing knowledge graph enhanced pre-trained language models (KGPLMs) as well as their applications. Inspired by existing studies on KGPLM, this paper proposes to enhance LLMs with KGs by developing knowledge graph-enhanced large language models (KGLLMs). KGLLM provides a solution to enhance LLMs' factual reasoning ability, opening up new avenues for LLM research.
NoisywikiHow: A Benchmark for Learning with Real-world Noisy Labels in Natural Language Processing
Wu, Tingting, Ding, Xiao, Tang, Minji, Zhang, Hao, Qin, Bing, Liu, Ting
Large-scale datasets in the real world inevitably involve label noise. Deep models can gradually overfit noisy labels and thus degrade model generalization. To mitigate the effects of label noise, learning with noisy labels (LNL) methods are designed to achieve better generalization performance. Due to the lack of suitable datasets, previous studies have frequently employed synthetic label noise to mimic real-world label noise. However, synthetic noise is not instance-dependent, making this approximation not always effective in practice. Recent research has proposed benchmarks for learning with real-world noisy labels. However, the noise sources within may be single or fuzzy, making benchmarks different from data with heterogeneous label noises in the real world. To tackle these issues, we contribute NoisywikiHow, the largest NLP benchmark built with minimal supervision. Specifically, inspired by human cognition, we explicitly construct multiple sources of label noise to imitate human errors throughout the annotation, replicating real-world noise, whose corruption is affected by both ground-truth labels and instances. Moreover, we provide a variety of noise levels to support controlled experiments on noisy data, enabling us to evaluate LNL methods systematically and comprehensively. After that, we conduct extensive multi-dimensional experiments on a broad range of LNL methods, obtaining new and intriguing findings.
ReCo: Reliable Causal Chain Reasoning via Structural Causal Recurrent Neural Networks
Xiong, Kai, Ding, Xiao, Li, Zhongyang, Du, Li, Qin, Bing, Zheng, Yi, Huai, Baoxing
Causal chain reasoning (CCR) is an essential ability for many decision-making AI systems, which requires the model to build reliable causal chains by connecting causal pairs. However, CCR suffers from two main transitive problems: threshold effect and scene drift. In other words, the causal pairs to be spliced may have a conflicting threshold boundary or scenario. To address these issues, we propose a novel Reliable Causal chain reasoning framework~(ReCo), which introduces exogenous variables to represent the threshold and scene factors of each causal pair within the causal chain, and estimates the threshold and scene contradictions across exogenous variables via structural causal recurrent neural networks~(SRNN). Experiments show that ReCo outperforms a series of strong baselines on both Chinese and English CCR datasets. Moreover, by injecting reliable causal chain knowledge distilled by ReCo, BERT can achieve better performances on four downstream causal-related tasks than BERT models enhanced by other kinds of knowledge.
AlphaFold Accelerates Artificial Intelligence Powered Drug Discovery: Efficient Discovery of a Novel Cyclin-dependent Kinase 20 (CDK20) Small Molecule Inhibitor
Ren, Feng, Ding, Xiao, Zheng, Min, Korzinkin, Mikhail, Cai, Xin, Zhu, Wei, Mantsyzov, Alexey, Aliper, Alex, Aladinskiy, Vladimir, Cao, Zhongying, Kong, Shanshan, Long, Xi, Liu, Bonnie Hei Man, Liu, Yingtao, Naumov, Vladimir, Shneyderman, Anastasia, Ozerov, Ivan V., Wang, Ju, Pun, Frank W., Aspuru-Guzik, Alan, Levitt, Michael, Zhavoronkov, Alex
The AlphaFold computer program predicted protein structures for the whole human genome, which has been considered as a remarkable breakthrough both in artificial intelligence (AI) application and structural biology. Despite the varying confidence level, these predicted structures still could significantly contribute to structure-based drug design of novel targets, especially the ones with no or limited structural information. In this work, we successfully applied AlphaFold in our end-to-end AI-powered drug discovery engines constituted of a biocomputational platform PandaOmics and a generative chemistry platform Chemistry42, to identify a first-in-class hit molecule of a novel target without an experimental structure starting from target selection towards hit identification in a cost- and time-efficient manner. PandaOmics provided the targets of interest and Chemistry42 generated the molecules based on the AlphaFold predicted structure, and the selected molecules were synthesized and tested in biological assays. Through this approach, we identified a small molecule hit compound for CDK20 with a Kd value of 8.9 +/- 1.6 uM (n = 4) within 30 days from target selection and after only synthesizing 7 compounds. Based on the available data, the second round of AI-powered compound generation was conducted and through which, a more potent hit molecule, ISM042-2 048, was discovered with a Kd value of 210.0 +/- 42.4 nM (n = 2), within 30 days and after synthesizing 6 compounds from the discovery of the first hit ISM042-2-001. To the best of our knowledge, this is the first reported small molecule targeting CDK20 and more importantly, this work is the first demonstration of AlphaFold application in the hit identification process in early drug discovery.
Modeling Event Background for If-Then Commonsense Reasoning Using Context-aware Variational Autoencoder
Du, Li, Ding, Xiao, Liu, Ting, Li, Zhongyang
To facilitate this, Rashkin et al. (2018) build the Event2Mind dataset and Sap et al. (2018) present the Atomic dataset, mainly focus on nine If-Then reasoning types to describe causes, effects, intents and participant characteristic about events. Together with these datasets, a simple RNN-based encoder-decoder framework is proposed to conduct the If-Then reasoning. However, there still remains two challenging problems. First, as illustrated in Figure 1, given an event "PersonX finds a new job", the plausible feeling of PersonX about that event could be multiple (such as "needy/stressed out" and "relieved/joyful"). Previous work showed that for the one-to-many problem, conventional RNN-based encoder-decoder models tend to generate generic responses, rather than meaningful and specific answers (Li et al., 2016; Serban et al., 2016). Second, as a commonsense reasoning problem, rich background knowledge is necessary for generating reasonable inferences. For example, as shown in Figure 1, the feeling of PersonX upon the event "PersonX finds a new job" could be multiple. However, after given a context " PersonX was fired", the plausible inferences would be narrowed down to " needy" or " stressed out ". To better solve these problems, we propose a context-aware variational autoencoder (CWV AE) together with a two-stage training procedure.
Event Representation Learning Enhanced with External Commonsense Knowledge
Ding, Xiao, Liao, Kuo, Liu, Ting, Li, Zhongyang, Duan, Junwen
Event Representation Learning Enhanced with External Commonsense Knowledge Xiao Ding, Kuo Liao, Ting Liu, Zhongyang Li, Junwen Duan Research Center for Social Computing and Information Retrieval Harbin Institute of Technology, China {xding, kliao, tliu, zyli, jwduan }@ir.hit.edu.cn Abstract Prior work has proposed effective methods to learn event representations that can capture syntactic and semantic information over text corpus, demonstrating their effectiveness for downstream tasks such as script event prediction. On the other hand, events extracted from raw texts lacks of commonsense knowledge, such as the intents and emotions of the event participants, which are useful for distinguishing event pairs when there are only subtle differences in their surface realizations. To address this issue, this paper proposes to leverage external commonsense knowledge about the intent and sentiment of the event. Experiments on three event-related tasks, i.e., event similarity, script event prediction and stock market prediction, show that our model obtains much better event embeddings for the tasks, achieving 78% improvements on hard similarity task, yielding more precise inferences on subsequent events under given contexts, and better accuracies in predicting the volatilities of the stock market 1 . 1 Introduction Events are a kind of important objective information of the world. Structuralizing and representing such information as machine-readable knowledge are crucial to artificial intelligence (Li et al., 2018b, 2019). The main idea is to learn distributed representations for structured events (i.e. Figure 1: Intent and sentiment enhanced event embed-dings can distinguish distinct events even with high lexical overlap, and find similar events even with low lexical overlap.
ELG: An Event Logic Graph
Ding, Xiao, Li, Zhongyang, Liu, Ting, Liao, Kuo
The evolution and development of events have their own basic principles, which make events happen sequentially. Therefore, the discovery of such evolutionary patterns among events are of great value for event prediction, decision-making and scenario design of dialog systems. However, conventional knowledge graph mainly focuses on the entities and their relations, which neglects the real world events. In this paper, we present a novel type of knowledge base - Event Logic Graph (ELG), which can reveal evolutionary patterns and development logics of real world events. Specifically, ELG is a directed cyclic graph, whose nodes are events, and edges stand for the sequential, causal or hypernym-hyponym (is-a) relations between events. We constructed two domain ELG: financial domain ELG, which consists of more than 1.5 million of event nodes and more than 1.8 million of directed edges, and travel domain ELG, which consists of about 30 thousand of event nodes and more than 234 thousand of directed edges. Experimental results show that ELG is effective for the task of script event prediction.
Constructing Narrative Event Evolutionary Graph for Script Event Prediction
Li, Zhongyang, Ding, Xiao, Liu, Ting
Script event prediction requires a model to predict the subsequent event given an existing event context. Previous models based on event pairs or event chains cannot make full use of dense event connections, which may limit their capability of event prediction. To remedy this, we propose constructing an event graph to better utilize the event network information for script event prediction. In particular, we first extract narrative event chains from large quantities of news corpus, and then construct a narrative event evolutionary graph (NEEG) based on the extracted chains. NEEG can be seen as a knowledge base that describes event evolutionary principles and patterns. To solve the inference problem on NEEG, we present a scaled graph neural network (SGNN) to model event interactions and learn better event representations. Instead of computing the representations on the whole graph, SGNN processes only the concerned nodes each time, which makes our model feasible to large-scale graphs. By comparing the similarity between input context event representations and candidate event representations, we can choose the most reasonable subsequent event. Experimental results on widely used New York Times corpus demonstrate that our model significantly outperforms state-of-the-art baseline methods, by using standard multiple choice narrative cloze evaluation.
Mining User Consumption Intention from Social Media Using Domain Adaptive Convolutional Neural Network
Ding, Xiao (Harbin Institute of Technology) | Liu, Ting (Harbin Institute of Technology) | Duan, Junwen (Harbin Institute of Technology) | Nie, Jian-Yun (University of Montreal)
Social media platforms are often used by people to express their needs and desires. Such data offer great opportunities to identify users’ consumption intention from user-generated contents, so that better tailored products or services can be recommended. However, there have been few efforts on mining commercial intents from social media contents. In this paper, we investigate the use of social media data to identify consumption intentions for individuals. We develop a Consumption Intention Mining Model (CIMM) based on convolutional neural network (CNN), for identifying whether the user has a consumption intention. The task is domain-dependent, and learning CNN requires a large number of annotated instances, which can be available only in some domains. Hence, we investigate the possibility of transferring the CNN mid-level sentence representation learned from one domain to another by adding an adaptation layer. To demonstrate the effectiveness of CIMM, we conduct experiments on two domains. Our results show that CIMM offers a powerful paradigm for effectively identifying users’ consumption intention based on their social media data. Moreover, our results also confirm that the CNN learned in one domain can be effectively transferred to another domain. This suggests that a great potential for our model to significantly increase effectiveness of product recommendations and targeted advertising.