AITopics | Du, Mengnan

Collaborating Authors

Du, Mengnan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Shortcut Learning of Large Language Models in Natural Language Understanding

Du, Mengnan, He, Fengxiang, Zou, Na, Tao, Dacheng, Hu, Xia

arXiv.org Artificial IntelligenceMay-7-2023

Large language models (LLMs) have achieved state-of-the-art performance on a series of natural language understanding tasks. However, these LLMs might rely on dataset bias and artifacts as shortcuts for prediction. This has significantly affected their generalizability and adversarial robustness. In this paper, we provide a review of recent developments that address the shortcut learning and robustness challenge of LLMs. We first introduce the concepts of shortcut learning of language models. We then introduce methods to identify shortcut learning behavior in language models, characterize the reasons for shortcut learning, as well as introduce mitigation solutions. Finally, we discuss key research challenges and potential research directions in order to advance the field of LLMs.

machine learning, natural language, shortcut, (18 more...)

arXiv.org Artificial Intelligence

2208.11857

Country: North America > United States > Texas (0.46)

Genre:

Overview (1.00)
Research Report > New Finding (0.68)

Industry:

Information Technology > Security & Privacy (0.93)
Education (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Mitigating Algorithmic Bias with Limited Annotations

Wang, Guanchu, Du, Mengnan, Liu, Ninghao, Zou, Na, Hu, Xia

arXiv.org Artificial IntelligenceMar-13-2023

Existing work on fairness modeling commonly assumes that sensitive attributes for all instances are fully available, which may not be true in many real-world applications due to the high cost of acquiring sensitive information. When sensitive attributes are not disclosed or available, it is needed to manually annotate a small part of the training data to mitigate bias. However, the skewed distribution across different sensitive groups preserves the skewness of the original dataset in the annotated subset, which leads to non-optimal bias mitigation. To tackle this challenge, we propose Active Penalization Of Discrimination (APOD), an interactive framework to guide the limited annotations towards maximally eliminating the effect of algorithmic bias. The proposed APOD integrates discrimination penalization with active instance selection to efficiently utilize the limited annotation budget, and it is theoretically proved to be capable of bounding the algorithmic bias. According to the evaluation on five benchmark datasets, APOD outperforms the state-of-the-arts baseline methods under the limited annotation budget, and shows comparable performance to fully annotated bias mitigation, which demonstrates that APOD could benefit real-world applications when sensitive information is limited. The source code of the proposed method is available at: https://anonymous.4open.science/r/APOD-fairness-4C02.

artificial intelligence, dataset, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2207.10018

Country: North America > United States (0.46)

Genre: Research Report (0.82)

Industry:

Information Technology > Security & Privacy (0.68)
Banking & Finance > Credit (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)

Add feedback

Accelerating Shapley Explanation via Contributive Cooperator Selection

Wang, Guanchu, Chuang, Yu-Neng, Du, Mengnan, Yang, Fan, Zhou, Quan, Tripathi, Pushkar, Cai, Xuanting, Hu, Xia

arXiv.org Artificial IntelligenceMar-6-2023

Even though Shapley value provides an effective explanation for a DNN model prediction, the computation relies on the enumeration of all possible input feature coalitions, which leads to the exponentially growing complexity. To address this problem, we propose a novel method SHEAR to significantly accelerate the Shapley explanation for DNN models, where only a few coalitions of input features are involved in the computation. The selection of the feature coalitions follows our proposed Shapley chain rule to minimize the absolute error from the ground-truth Shapley values, such that the computation can be both efficient and accurate. To demonstrate the effectiveness, we comprehensively evaluate SHEAR across multiple metrics including the absolute error from the ground-truth Shapley value, the faithfulness of the explanations, and running speed. The experimental results indicate SHEAR consistently outperforms state-of-the-art baseline methods across different evaluation metrics, which demonstrates its potentials in real-world applications where the computational resource is limited.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2206.08529

Country: North America > United States (0.67)

Genre: Research Report > Promising Solution (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Game Theory (0.92)
Information Technology > Data Science > Data Mining (0.69)

Add feedback

Understanding and Unifying Fourteen Attribution Methods with Taylor Interactions

Deng, Huiqi, Zou, Na, Du, Mengnan, Chen, Weifu, Feng, Guocan, Yang, Ziwei, Li, Zheyang, Zhang, Quanshi

arXiv.org Artificial IntelligenceMar-5-2023

However, existing attribution methods are often built upon different heuristics. There remains a lack of a unified theoretical understanding of why these methods are effective and how they are related. To this end, for the first time, we formulate core mechanisms of fourteen attribution methods, which were designed on different heuristics, into the same mathematical system, i.e., the system of Taylor interactions. Specifically, we prove that attribution scores estimated by fourteen attribution methods can all be reformulated as the weighted sum of two types of effects, i.e., independent effects of each individual input variable and interaction effects between input variables. The essential difference among the fourteen attribution methods mainly lies in the weights of allocating different effects. Based on the above findings, we propose three principles for a fair allocation of effects to evaluate the faithfulness of the fourteen attribution methods.

artificial intelligence, attribution, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2303.01506

Country:

Asia > China (0.68)
North America > United States (0.67)

Genre: Research Report (0.81)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Data Science (0.67)

Add feedback

Robustness Challenges in Model Distillation and Pruning for Natural Language Understanding

Du, Mengnan, Mukherjee, Subhabrata, Cheng, Yu, Shokouhi, Milad, Hu, Xia, Awadallah, Ahmed Hassan

arXiv.org Artificial IntelligenceFeb-26-2023

Recent work has focused on compressing pre-trained language models (PLMs) like BERT where the major focus has been to improve the in-distribution performance for downstream tasks. However, very few of these studies have analyzed the impact of compression on the generalizability and robustness of compressed models for out-of-distribution (OOD) data. Towards this end, we study two popular model compression techniques including knowledge distillation and pruning and show that the compressed models are significantly less robust than their PLM counterparts on OOD test sets although they obtain similar performance on in-distribution development sets for a task. Further analysis indicates that the compressed models overfit on the shortcut samples and generalize poorly on the hard ones. We further leverage this observation to develop a regularization strategy for robust model compression based on sample uncertainty. Experimental results on several natural language understanding tasks demonstrate that our bias mitigation framework improves the OOD generalization of the compressed models, while not sacrificing the in-distribution task performance.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2110.08419

Country: Europe (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Understanding (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Efficient XAI Techniques: A Taxonomic Survey

Chuang, Yu-Neng, Wang, Guanchu, Yang, Fan, Liu, Zirui, Cai, Xuanting, Du, Mengnan, Hu, Xia

arXiv.org Artificial IntelligenceFeb-15-2023

Abstract--Recently, there has been a growing demand for the deployment of Explainable Artificial Intelligence (XAI) algorithms in real-world applications. However, traditional XAI methods typically suffer from a high computational complexity problem, which discourages the deployment of real-time systems to meet the time-demanding requirements of real-world scenarios. Although many approaches have been proposed to improve the efficiency of XAI methods, a comprehensive understanding of the achievements and challenges is still needed. To this end, in this paper we provide a review of efficient XAI. The efficient non-amortized methods focus on data-centric or model-centric acceleration upon each individual instance. In contrast, amortized methods focus on learning a unified distribution of model explanations, following the predictive, generative, or reinforcement frameworks, to rapidly derive multiple model explanations. We also analyze the limitations of an efficient XAI pipeline from the perspectives of the training phase, the deployment phase, and the use scenarios. Finally, we summarize the challenges of deploying XAI acceleration methods to real-world scenarios, overcoming the trade-off between faithfulness and efficiency, and the selection of different acceleration methods. Despite the advancements in ML, providing instance requires a unique explainer during the derivation transparency in the models, particularly in deep neural of the explanation. In addition, the local explanation suffers networks (DNNs), remains a substantial challenge. The lack from extensive computational conditions due to the pending of transparency can lead to mistrust and skepticism of ML amounts of tested instances, where each instance requires model predictions, such as the block-box driving decisions massive permutation times to complete the importance score made by autopilots.

data mining, explanation, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2302.03225

Country: Asia > China (0.28)

Genre:

Research Report (1.00)
Overview (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Mitigating Relational Bias on Knowledge Graphs

Chuang, Yu-Neng, Lai, Kwei-Herng, Tang, Ruixiang, Du, Mengnan, Chang, Chia-Yuan, Zou, Na, Hu, Xia

arXiv.org Artificial IntelligenceDec-3-2022

Knowledge graph data are prevalent in real-world applications, and knowledge graph neural networks (KGNNs) are essential techniques for knowledge graph representation learning. Although KGNN effectively models the structural information from knowledge graphs, these frameworks amplify the underlying data bias that leads to discrimination towards certain groups or individuals in resulting applications. Additionally, as existing debiasing approaches mainly focus on the entity-wise bias, eliminating the multi-hop relational bias that pervasively exists in knowledge graphs remains an open question. However, it is very challenging to eliminate relational bias due to the sparsity of the paths that generate the bias and the non-linear proximity structure of knowledge graphs. To tackle the challenges, we propose Fair-KGNN, a KGNN framework that simultaneously alleviates multi-hop bias and preserves the proximity information of entity-to-relation in knowledge graphs. The proposed framework is generalizable to mitigate the relational bias for all types of KGNN. We develop two instances of Fair-KGNN incorporating with two state-of-the-art KGNN models, RGCN and CompGCN, to mitigate gender-occupation and nationality-salary bias. The experiments carried out on three benchmark knowledge graph datasets demonstrate that the Fair-KGNN can effectively mitigate unfair situations during representation learning while preserving the predictive performance of KGNN models.

artificial intelligence, relational bias, representation, (15 more...)

arXiv.org Artificial Intelligence

2211.14489

Country: North America > United States (0.46)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (1.00)

Add feedback

Unveiling Project-Specific Bias in Neural Code Models

Li, Zhiming, Li, Yanzhou, Li, Tianlin, Du, Mengnan, Wu, Bozhi, Cao, Yushi, Xie, Xiaofei, Li, Yi, Liu, Yang

arXiv.org Artificial IntelligenceJan-18-2022

Neural code models have introduced significant improvements over many software analysis tasks like type inference, vulnerability detection, etc. Despite the good performance of such models under the common intra-project independent and identically distributed (IID) training and validation setting, we observe that they usually fail to generalize to real-world inter-project out-of-distribution (OOD) setting. In this work, we show that such phenomenon is caused by model heavily relying on project-specific, ungeneralizable tokens like self-defined variable and function names for downstream prediction, and we formulate it as the project-specific bias learning behavior. We propose a measurement to interpret such behavior, termed as Cond-Idf, which combines co-occurrence probability and inverse document frequency to measure the level of relatedness of token with label and its project-specificness. The approximation indicates that without proper regularization with prior knowledge, model tends to leverage spurious statistical cues for prediction. Equipped with these observations, we propose a bias mitigation mechanism Batch Partition Regularization (BPR) that regularizes model to infer based on proper behavior by leveraging latent logic relations among samples. Experimental results on two deep code benchmarks indicate that BPR can improve both inter-project OOD generalization and adversarial robustness while not sacrificing accuracy on IID data.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2201.07381

Country: North America > United States > Texas (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

Learning Disentangled Representations for Time Series

Li, Yuening, Chen, Zhengzhang, Zha, Daochen, Du, Mengnan, Zhang, Denghui, Chen, Haifeng, Hu, Xia

arXiv.org Artificial IntelligenceMay-21-2021

Time-series representation learning is a fundamental task for time-series analysis. While significant progress has been made to achieve accurate representations for downstream applications, the learned representations often lack interpretability and do not expose semantic meanings. Different from previous efforts on the entangled feature space, we aim to extract the semantic-rich temporal correlations in the latent interpretable factorized representation of the data. Motivated by the success of disentangled representation learning in computer vision, we study the possibility of learning semantic-rich time-series representations, which remains unexplored due to three main challenges: 1) sequential data structure introduces complex temporal correlations and makes the latent representations hard to interpret, 2) sequential models suffer from KL vanishing problem, and 3) interpretable semantic concepts for time-series often rely on multiple factors instead of individuals. To bridge the gap, we propose Disentangle Time Series (DTS), a novel disentanglement enhancement framework for sequential data. Specifically, to generate hierarchical semantic concepts as the interpretable and disentangled representation of time-series, DTS introduces multi-level disentanglement strategies by covering both individual latent factors and group semantic segments. We further theoretically show how to alleviate the KL vanishing problem: DTS introduces a mutual information maximization term, while preserving a heavier penalty on the total correlation and the dimension-wise KL to keep the disentanglement property. Experimental results on various real-world benchmark datasets demonstrate that the representations learned by DTS achieve superior performance in downstream applications, with high interpretability of semantic concepts.

deep learning, representation, vascular disease, (21 more...)

arXiv.org Artificial Intelligence

2105.08179

Country: North America > United States > Texas (0.14)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
(2 more...)

Add feedback

Mutual Information Preserving Back-propagation: Learn to Invert for Faithful Attribution

Deng, Huiqi, Zou, Na, Chen, Weifu, Feng, Guocan, Du, Mengnan, Hu, Xia

arXiv.org Artificial IntelligenceApr-14-2021

Back propagation based visualizations have been proposed to interpret deep neural networks (DNNs), some of which produce interpretations with good visual quality. However, there exist doubts about whether these intuitive visualizations are related to the network decisions. Recent studies have confirmed this suspicion by verifying that almost all these modified back-propagation visualizations are not faithful to the model's decision-making process. Besides, these visualizations produce vague "relative importance scores", among which low values can't guarantee to be independent of the final prediction. Hence, it's highly desirable to develop a novel back-propagation framework that guarantees theoretical faithfulness and produces a quantitative attribution score with a clear understanding. To achieve the goal, we resort to mutual information theory to generate the interpretations, studying how much information of output is encoded in each input neuron. The basic idea is to learn a source signal by back-propagation such that the mutual information between input and output should be as much as possible preserved in the mutual information between input and the source signal. In addition, we propose a Mutual Information Preserving Inverse Network, termed MIP-IN, in which the parameters of each layer are recursively trained to learn how to invert. During the inversion, forward Relu operation is adopted to adapt the general interpretations to the specific input. We then empirically demonstrate that the inverted source signal satisfies completeness and minimality property, which are crucial for a faithful interpretation. Furthermore, the empirical study validates the effectiveness of interpretations generated by MIP-IN.

deep learning, information, neural network, (19 more...)

arXiv.org Artificial Intelligence

2104.06629

Country:

Asia > China (0.14)
North America > United States (0.14)

Genre: Research Report > New Finding (0.86)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback