AITopics | Pei, Jian

Collaborating Authors

Pei, Jian

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Trustworthy Graph Neural Networks: Aspects, Methods and Trends

Zhang, He, Wu, Bang, Yuan, Xingliang, Pan, Shirui, Tong, Hanghang, Pei, Jian

arXiv.org Artificial IntelligenceFeb-21-2024

Graph neural networks (GNNs) have emerged as a series of competent graph learning methods for diverse real-world scenarios, ranging from daily applications like recommendation systems and question answering to cutting-edge technologies such as drug discovery in life sciences and n-body simulation in astrophysics. However, task performance is not the only requirement for GNNs. Performance-oriented GNNs have exhibited potential adverse effects like vulnerability to adversarial attacks, unexplainable discrimination against disadvantaged groups, or excessive resource consumption in edge computing environments. To avoid these unintentional harms, it is necessary to build competent GNNs characterised by trustworthiness. To this end, we propose a comprehensive roadmap to build trustworthy GNNs from the view of the various computing technologies involved. In this survey, we introduce basic concepts and comprehensively summarise existing efforts for trustworthy GNNs from six aspects, including robustness, explainability, privacy, fairness, accountability, and environmental well-being. Additionally, we highlight the intricate cross-aspect relations between the above six aspects of trustworthy GNNs. Finally, we present a thorough overview of trending directions for facilitating the research and industrialisation of trustworthy GNNs.

artificial intelligence, machine learning, trustworthy graph neural network, (2 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/JPROC.2024.3369017

2205.07424

Genre: Overview (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.53)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.60)

Add feedback

FairSample: Training Fair and Accurate Graph Convolutional Neural Networks Efficiently

Cong, Zicun, Baoxu, Shi, Li, Shan, Yang, Jaewon, He, Qi, Pei, Jian

arXiv.org Artificial IntelligenceJan-26-2024

Fairness in Graph Convolutional Neural Networks (GCNs) becomes a more and more important concern as GCNs are adopted in many crucial applications. Societal biases against sensitive groups may exist in many real world graphs. GCNs trained on those graphs may be vulnerable to being affected by such biases. In this paper, we adopt the well-known fairness notion of demographic parity and tackle the challenge of training fair and accurate GCNs efficiently. We present an in-depth analysis on how graph structure bias, node attribute bias, and model parameters may affect the demographic parity of GCNs. Our insights lead to FairSample, a framework that jointly mitigates the three types of biases. We employ two intuitive strategies to rectify graph structures. First, we inject edges across nodes that are in different sensitive groups but similar in node features. Second, to enhance model fairness and retain model quality, we develop a learnable neighbor sampling policy using reinforcement learning. To address the bias in node features and model parameters, FairSample is complemented by a regularization objective to optimize fairness.

artificial intelligence, machine learning, node, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TKDE.2023.3306378

2401.14702

Country: North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

TrustLLM: Trustworthiness in Large Language Models

Sun, Lichao, Huang, Yue, Wang, Haoran, Wu, Siyuan, Zhang, Qihui, Gao, Chujie, Huang, Yixin, Lyu, Wenhan, Zhang, Yixuan, Li, Xiner, Liu, Zhengliang, Liu, Yixin, Wang, Yijue, Zhang, Zhikun, Kailkhura, Bhavya, Xiong, Caiming, Xiao, Chaowei, Li, Chunyuan, Xing, Eric, Huang, Furong, Liu, Hao, Ji, Heng, Wang, Hongyi, Zhang, Huan, Yao, Huaxiu, Kellis, Manolis, Zitnik, Marinka, Jiang, Meng, Bansal, Mohit, Zou, James, Pei, Jian, Liu, Jian, Gao, Jianfeng, Han, Jiawei, Zhao, Jieyu, Tang, Jiliang, Wang, Jindong, Mitchell, John, Shu, Kai, Xu, Kaidi, Chang, Kai-Wei, He, Lifang, Huang, Lifu, Backes, Michael, Gong, Neil Zhenqiang, Yu, Philip S., Chen, Pin-Yu, Gu, Quanquan, Xu, Ran, Ying, Rex, Ji, Shuiwang, Jana, Suman, Chen, Tianlong, Liu, Tianming, Zhou, Tianyi, Wang, William, Li, Xiang, Zhang, Xiangliang, Wang, Xiao, Xie, Xing, Chen, Xun, Wang, Xuyu, Liu, Yan, Ye, Yanfang, Cao, Yinzhi, Chen, Yong, Zhao, Yue

arXiv.org Artificial IntelligenceJan-25-2024

Large language models (LLMs), exemplified by ChatGPT, have gained considerable attention for their excellent natural language processing capabilities. Nonetheless, these LLMs present many challenges, particularly in the realm of trustworthiness. Therefore, ensuring the trustworthiness of LLMs emerges as an important topic. This paper introduces TrustLLM, a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions. Specifically, we first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we further establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics. We then present a study evaluating 16 mainstream LLMs in TrustLLM, consisting of over 30 datasets. Our findings firstly show that in general trustworthiness and utility (i.e., functional effectiveness) are positively related. Secondly, our observations reveal that proprietary LLMs generally outperform most open-source counterparts in terms of trustworthiness, raising concerns about the potential risks of widely accessible open-source LLMs. However, a few open-source LLMs come very close to proprietary ones. Thirdly, it is important to note that some LLMs may be overly calibrated towards exhibiting trustworthiness, to the extent that they compromise their utility by mistakenly treating benign prompts as harmful and consequently not responding. Finally, we emphasize the importance of ensuring transparency not only in the models themselves but also in the technologies that underpin trustworthiness. Knowing the specific trustworthy technologies that have been employed is crucial for analyzing their effectiveness.

large language model, machine learning, natural language, (23 more...)

arXiv.org Artificial Intelligence

2401.05561

Country:

Europe (1.00)
Asia > Middle East (0.67)
North America > United States > California (0.67)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)
Personal > Interview (0.92)

Industry:

Media > News (1.00)
Leisure & Entertainment (1.00)
Law > Civil Rights & Constitutional Law (1.00)
(13 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Coherent Entity Disambiguation via Modeling Topic and Categorical Dependency

Xiao, Zilin, Shou, Linjun, Zhang, Xingyao, Wu, Jie, Gong, Ming, Pei, Jian, Jiang, Daxin

arXiv.org Artificial IntelligenceNov-6-2023

Previous entity disambiguation (ED) methods adopt a discriminative paradigm, where prediction is made based on matching scores between mention context and candidate entities using length-limited encoders. However, these methods often struggle to capture explicit discourse-level dependencies, resulting in incoherent predictions at the abstract level (e.g. topic or category). We propose CoherentED, an ED system equipped with novel designs aimed at enhancing the coherence of entity predictions. Our method first introduces an unsupervised variational autoencoder (VAE) to extract latent topic vectors of context sentences. This approach not only allows the encoder to handle longer documents more effectively, conserves valuable input space, but also keeps a topic-level coherence. Additionally, we incorporate an external category memory, enabling the system to retrieve relevant categories for undecided mentions. By employing step-by-step entity decisions, this design facilitates the modeling of entity-entity interactions, thereby maintaining maximum coherence at the category level. We achieve new state-of-the-art results on popular ED benchmarks, with an average improvement of 1.3 F1 points. Our model demonstrates particularly outstanding performance on challenging long-text scenarios.

computational linguistic, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2311.03253

Country:

Europe (1.00)
Asia (0.93)
North America > United States > Washington > King County > Seattle (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.50)

Industry:

Media (1.00)
Banking & Finance > Economy (1.00)
Government > Regional Government > North America Government > United States Government (0.94)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Instructed Language Models with Retrievers Are Powerful Entity Linkers

Xiao, Zilin, Gong, Ming, Wu, Jie, Zhang, Xingyao, Shou, Linjun, Pei, Jian, Jiang, Daxin

arXiv.org Artificial IntelligenceNov-6-2023

Generative approaches powered by large language models (LLMs) have demonstrated emergent abilities in tasks that require complex reasoning abilities. Yet the generative nature still makes the generated content suffer from hallucinations, thus unsuitable for entity-centric tasks like entity linking (EL) requiring precise entity predictions over a large knowledge base. We present Instructed Generative Entity Linker (INSGENEL), the first approach that enables casual language models to perform entity linking over knowledge bases. Several methods to equip language models with EL capability were proposed in this work, including (i) a sequence-to-sequence training EL objective with instruction-tuning, (ii) a novel generative EL framework based on a light-weight potential mention retriever that frees the model from heavy and non-parallelizable decoding, achieving 4$\times$ speedup without compromise on linking metrics. INSGENEL outperforms previous generative alternatives with +6.8 F1 points gain on average, also with a huge advantage in training data efficiency and training compute consumption. In addition, our skillfully engineered in-context learning (ICL) framework for EL still lags behind INSGENEL significantly, reaffirming that the EL task remains a persistent hurdle for general LLMs.

computational linguistic, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2311.0325

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Belgium > Brussels-Capital Region > Brussels (0.14)

Genre: Research Report (0.50)

Industry:

Leisure & Entertainment (0.46)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Domain Specialization as the Key to Make Large Language Models Disruptive: A Comprehensive Survey

Ling, Chen, Zhao, Xujiang, Lu, Jiaying, Deng, Chengyuan, Zheng, Can, Wang, Junxiang, Chowdhury, Tanmoy, Li, Yun, Cui, Hejie, Zhang, Xuchao, Zhao, Tianjiao, Panalkar, Amit, Cheng, Wei, Wang, Haoyu, Liu, Yanchi, Chen, Zhengzhang, Chen, Haifeng, White, Chris, Gu, Quanquan, Pei, Jian, Zhao, Liang

arXiv.org Artificial IntelligenceOct-17-2023

Large language models (LLMs) have significantly advanced the field of natural language processing (NLP), providing a highly useful, task-agnostic foundation for a wide range of applications. However, directly applying LLMs to solve sophisticated problems in specific domains meets many hurdles, caused by the heterogeneity of domain data, the sophistication of domain knowledge, the uniqueness of domain objectives, and the diversity of the constraints (e.g., various social norms, cultural conformity, religious beliefs, and ethical standards in the domain applications). Domain specification techniques are key to make large language models disruptive in many applications. Specifically, to solve these hurdles, there has been a notable increase in research and practices conducted in recent years on the domain specialization of LLMs. This emerging field of study, with its substantial potential for impact, necessitates a comprehensive and systematic review to better summarize and guide ongoing work in this area. In this article, we present a comprehensive survey on domain specification techniques for large language models, an emerging direction critical for large language model applications. First, we propose a systematic taxonomy that categorizes the LLM domain-specialization techniques based on the accessibility to LLMs and summarizes the framework for all the subcategories as well as their relations and differences to each other. Second, we present an extensive taxonomy of critical application domains that can benefit dramatically from specialized LLMs, discussing their practical significance and open challenges. Last, we offer our insights into the current research status and future trends in this area.

artificial intelligence, large language model, natural language, (4 more...)

arXiv.org Artificial Intelligence

2305.18703

Genre: Overview (0.89)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Serverless Federated AUPRC Optimization for Multi-Party Collaborative Imbalanced Data Mining

Wu, Xidong, Hu, Zhengmian, Pei, Jian, Huang, Heng

arXiv.org Artificial IntelligenceAug-6-2023

Multi-party collaborative training, such as distributed learning and federated learning, is used to address the big data challenges. However, traditional multi-party collaborative training algorithms were mainly designed for balanced data mining tasks and are intended to optimize accuracy (\emph{e.g.}, cross-entropy). The data distribution in many real-world applications is skewed and classifiers, which are trained to improve accuracy, perform poorly when applied to imbalanced data tasks since models could be significantly biased toward the primary class. Therefore, the Area Under Precision-Recall Curve (AUPRC) was introduced as an effective metric. Although single-machine AUPRC maximization methods have been designed, multi-party collaborative algorithm has never been studied. The change from the single-machine to the multi-party setting poses critical challenges. To address the above challenge, we study the serverless multi-party collaborative AUPRC maximization problem since serverless multi-party collaborative training can cut down the communications cost by avoiding the server node bottleneck, and reformulate it as a conditional stochastic optimization problem in a serverless multi-party collaborative learning setting and propose a new ServerLess biAsed sTochastic gradiEnt (SLATE) algorithm to directly optimize the AUPRC. After that, we use the variance reduction technique and propose ServerLess biAsed sTochastic gradiEnt with Momentum-based variance reduction (SLATE-M) algorithm to improve the convergence rate, which matches the best theoretical convergence result reached by the single-machine online method. To the best of our knowledge, this is the first work to solve the multi-party collaborative AUPRC maximization problem.

artificial intelligence, data mining, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2308.03035

Country: North America > United States > Maryland (0.28)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

Add feedback

Bridging the Gap Between Indexing and Retrieval for Differentiable Search Index with Query Generation

Zhuang, Shengyao, Ren, Houxing, Shou, Linjun, Pei, Jian, Gong, Ming, Zuccon, Guido, Jiang, Daxin

arXiv.org Artificial IntelligenceJul-7-2023

The Differentiable Search Index (DSI) is an emerging paradigm for information retrieval. Unlike traditional retrieval architectures where index and retrieval are two different and separate components, DSI uses a single transformer model to perform both indexing and retrieval. In this paper, we identify and tackle an important issue of current DSI models: the data distribution mismatch that occurs between the DSI indexing and retrieval processes. Specifically, we argue that, at indexing, current DSI methods learn to build connections between the text of long documents and the identifier of the documents, but then retrieval of document identifiers is based on queries that are commonly much shorter than the indexed documents. This problem is further exacerbated when using DSI for cross-lingual retrieval, where document text and query text are in different languages. To address this fundamental problem of current DSI models, we propose a simple yet effective indexing framework for DSI, called DSI-QG. When indexing, DSI-QG represents documents with a number of potentially relevant queries generated by a query generation model and re-ranked and filtered by a cross-encoder ranker. The presence of these queries at indexing allows the DSI models to connect a document identifier to a set of queries, hence mitigating data distribution mismatches present between the indexing and the retrieval phases. Empirical results on popular mono-lingual and cross-lingual passage retrieval datasets show that DSI-QG significantly outperforms the original DSI model.

information retrieval, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2206.10128

Country: North America > United States (0.67)

Genre: Research Report > New Finding (0.46)

Industry: Government > Military (0.46)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

LazyGNN: Large-Scale Graph Neural Networks via Lazy Propagation

Xue, Rui, Han, Haoyu, Torkamani, MohamadAli, Pei, Jian, Liu, Xiaorui

arXiv.org Artificial IntelligenceMay-31-2023

Recent works have demonstrated the benefits of capturing long-distance dependency in graphs by deeper graph neural networks (GNNs). But deeper GNNs suffer from the long-lasting scalability challenge due to the neighborhood explosion problem in large-scale graphs. In this work, we propose to capture long-distance dependency in graphs by shallower models instead of deeper models, which leads to a much more efficient model, LazyGNN, for graph representation learning. Moreover, we demonstrate that LazyGNN is compatible with existing scalable approaches (such as sampling methods) for further accelerations through the development of mini-batch LazyGNN. Comprehensive experiments demonstrate its superior prediction performance and scalability on large-scale benchmarks. The implementation of LazyGNN is available at https://github.com/RXPHD/Lazy_GNN.

artificial intelligence, lazygnn, machine learning, (12 more...)

arXiv.org Artificial Intelligence

2302.01503

Country: North America > United States > Hawaii (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Alleviating Over-smoothing for Unsupervised Sentence Representation

Chen, Nuo, Shou, Linjun, Gong, Ming, Pei, Jian, Cao, Bowen, Chang, Jianhui, Jiang, Daxin, Li, Jia

arXiv.org Artificial IntelligenceMay-9-2023

Currently, learning better unsupervised sentence representations is the pursuit of many natural language processing communities. Lots of approaches based on pre-trained language models (PLMs) and contrastive learning have achieved promising results on this task. Experimentally, we observe that the over-smoothing problem reduces the capacity of these powerful PLMs, leading to sub-optimal sentence representations. In this paper, we present a Simple method named Self-Contrastive Learning (SSCL) to alleviate this issue, which samples negatives from PLMs intermediate layers, improving the quality of the sentence representation. Our proposed method is quite simple and can be easily extended to various state-of-the-art models for performance boosting, which can be seen as a plug-and-play contrastive framework for learning unsupervised sentence representation. Extensive results prove that SSCL brings the superior performance improvements of different strong baselines (e.g., BERT and SimCSE) on Semantic Textual Similarity and Transfer datasets. Our codes are available at https://github.com/nuochenpku/SSCL.

artificial intelligence, natural language, text processing, (18 more...)

arXiv.org Artificial Intelligence

2305.06154

Country:

Asia > China (0.46)
North America > United States (0.28)

Genre: Research Report > Promising Solution (0.34)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)

Add feedback