AITopics | Du, Mengnan

Collaborating Authors

Du, Mengnan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Beyond Single Concept Vector: Modeling Concept Subspace in LLMs with Gaussian Distribution

Zhao, Haiyan, Zhao, Heng, Shen, Bo, Payani, Ali, Yang, Fan, Du, Mengnan

arXiv.org Artificial IntelligenceSep-30-2024

Probing learned concepts in large language models (LLMs) is crucial for understanding how semantic knowledge is encoded internally. Training linear classifiers on probing tasks is a principle approach to denote the vector of a certain concept in the representation space. However, the single vector identified for a concept varies with both data and training, making it less robust and weakening its effectiveness in real-world applications. To address this challenge, we propose an approach to approximate the subspace representing a specific concept. Built on linear probing classifiers, we extend the concept vectors into Gaussian Concept Subspace (GCS). We demonstrate GCS's effectiveness through measuring its faithfulness and plausibility across multiple LLMs with different sizes and architectures. Additionally, we use representation intervention tasks to showcase its efficacy in real-world applications such as emotion steering. Experimental results indicate that GCS concept vectors have the potential to balance steering performance and maintaining the fluency in natural language generation tasks.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2410.00153

Country: North America > United States (0.46)

Genre: Research Report > New Finding (1.00)

Industry:

Media > Film (1.00)
Leisure & Entertainment > Sports (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Exploring Multilingual Probing in Large Language Models: A Cross-Language Analysis

Li, Daoyang, Jin, Mingyu, Zeng, Qingcheng, Zhao, Haiyan, Du, Mengnan

arXiv.org Artificial IntelligenceSep-22-2024

Probing techniques for large language models (LLMs) have primarily focused on English, overlooking the vast majority of the world's languages. In this paper, we extend these probing methods to a multilingual context, investigating the behaviors of LLMs across diverse languages. We conduct experiments on several open-source LLM models, analyzing probing accuracy, trends across layers, and similarities between probing vectors for multiple languages. Our key findings reveal: (1) a consistent performance gap between high-resource and low-resource languages, with high-resource languages achieving significantly higher probing accuracy; (2) divergent layer-wise accuracy trends, where high-resource languages show substantial improvement in deeper layers similar to English; and (3) higher representational similarities among high-resource languages, with low-resource languages demonstrating lower similarities both among themselves and with high-resource languages. These results highlight significant disparities in LLMs' multilingual capabilities and emphasize the need for improved modeling of low-resource languages.

english, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2409.14459

Country: North America > United States > California (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)

Add feedback

Data-centric NLP Backdoor Defense from the Lens of Memorization

Wang, Zhenting, Wang, Zhizhi, Jin, Mingyu, Du, Mengnan, Zhai, Juan, Ma, Shiqing

arXiv.org Artificial IntelligenceSep-21-2024

Backdoor attack is a severe threat to the trustworthiness of DNN-based language models. In this paper, we first extend the definition of memorization of language models from sample-wise to more fine-grained sentence element-wise (e.g., word, phrase, structure, and style), and then point out that language model backdoors are a type of element-wise memorization. Through further analysis, we find that the strength of such memorization is positively correlated to the frequency of duplicated elements in the training dataset. In conclusion, duplicated sentence elements are necessary for successful backdoor attacks. Based on this, we propose a data-centric defense. We first detect trigger candidates in training data by finding memorizable elements, i.e., duplicated elements, and then confirm real triggers by testing if the candidates can activate backdoor behaviors (i.e., malicious elements). Results show that our method outperforms state-of-the-art defenses in defending against different types of NLP backdoors.

machine learning, memorization, natural language, (16 more...)

arXiv.org Artificial Intelligence

2409.142

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.68)

Add feedback

ProLLM: Protein Chain-of-Thoughts Enhanced LLM for Protein-Protein Interaction Prediction

Jin, Mingyu, Xue, Haochen, Wang, Zhenting, Kang, Boming, Ye, Ruosong, Zhou, Kaixiong, Du, Mengnan, Zhang, Yongfeng

arXiv.org Artificial IntelligenceJul-12-2024

The prediction of protein-protein interactions (PPIs) is crucial for understanding biological functions and diseases. Previous machine learning approaches to PPI prediction mainly focus on direct physical interactions, ignoring the broader context of nonphysical connections through intermediate proteins, thus limiting their effectiveness. The emergence of Large Language Models (LLMs) provides a new opportunity for addressing this complex biological challenge. By transforming structured data into natural language prompts, we can map the relationships between proteins into texts. This approach allows LLMs to identify indirect connections between proteins, tracing the path from upstream to downstream. Therefore, we propose a novel framework ProLLM that employs an LLM tailored for PPI for the first time. Specifically, we propose Protein Chain of Thought (ProCoT), which replicates the biological mechanism of signaling pathways as natural language prompts. ProCoT considers a signaling pathway as a protein reasoning process, which starts from upstream proteins and passes through several intermediate proteins to transmit biological signals to downstream proteins. Thus, we can use ProCoT to predict the interaction between upstream proteins and downstream proteins. The training of ProLLM employs the ProCoT format, which enhances the model's understanding of complex biological problems. In addition to ProCoT, this paper also contributes to the exploration of embedding replacement of protein sites in natural language prompts, and instruction fine-tuning in protein knowledge datasets. We demonstrate the efficacy of ProLLM through rigorous validation against benchmark datasets, showing significant improvement over existing methods in terms of prediction accuracy and generalizability. The code is available at: https://github.com/MingyuJ666/ProLLM.

artificial intelligence, large language model, natural language, (3 more...)

arXiv.org Artificial Intelligence

2405.06649

Genre: Research Report (0.40)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.60)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

When Search Engine Services meet Large Language Models: Visions and Challenges

Xiong, Haoyi, Bian, Jiang, Li, Yuchen, Li, Xuhong, Du, Mengnan, Wang, Shuaiqiang, Yin, Dawei, Helal, Sumi

arXiv.org Artificial IntelligenceJun-27-2024

Combining Large Language Models (LLMs) with search engine services marks a significant shift in the field of services computing, opening up new possibilities to enhance how we search for and retrieve information, understand content, and interact with internet services. This paper conducts an in-depth examination of how integrating LLMs with search engines can mutually benefit both technologies. We focus on two main areas: using search engines to improve LLMs (Search4LLM) and enhancing search engine functions using LLMs (LLM4Search). For Search4LLM, we investigate how search engines can provide diverse high-quality datasets for pre-training of LLMs, how they can use the most relevant documents to help LLMs learn to answer queries more accurately, how training LLMs with Learning-To-Rank (LTR) tasks can enhance their ability to respond with greater precision, and how incorporating recent search results can make LLM-generated content more accurate and current. In terms of LLM4Search, we examine how LLMs can be used to summarize content for better indexing by search engines, improve query outcomes through optimization, enhance the ranking of search results by analyzing document relevance, and help in annotating data for learning-to-rank tasks in various learning contexts. However, this promising integration comes with its challenges, which include addressing potential biases and ethical issues in training models, managing the computational and other costs of incorporating LLMs into search services, and continuously updating LLM training with the ever-changing web content. We discuss these challenges and chart out required research directions to address them. We also discuss broader implications for service computing, such as scalability, privacy concerns, and the need to adapt search engine architectures for these advanced models.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2407.00128

Country: Asia (0.28)

Genre:

Research Report (1.00)
Overview (0.92)

Industry:

Information Technology > Security & Privacy (0.87)
Leisure & Entertainment > Sports > Soccer (0.67)
Leisure & Entertainment > Sports > Football (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.67)

Add feedback

Neural Operator for Accelerating Coronal Magnetic Field Model

Du, Yutao, Li, Qin, Gnanasambandam, Raghav, Du, Mengnan, Wang, Haimin, Shen, Bo

arXiv.org Artificial IntelligenceJun-26-2024

Studying the sun's outer atmosphere is challenging due to its complex magnetic fields impacting solar activities. Magnetohydrodynamics (MHD) simulations help model these interactions but are extremely time-consuming (usually on a scale of days). Our research applies the Fourier Neural Operator (FNO) to accelerate the coronal magnetic field modeling, specifically, the Bifrost MHD model. We apply Tensorized FNO (TFNO) to generate solutions from partial differential equations (PDEs) over a 3D domain efficiently. TFNO's performance is compared with other deep learning methods, highlighting its accuracy and scalability. Physics analysis confirms that TFNO is reliable and capable of accelerating MHD simulations with high precision. This advancement improves efficiency in data handling, enhances predictive capabilities, and provides a better understanding of magnetic topologies.

artificial intelligence, ground truth, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2405.12754

Country:

North America > United States > New Jersey (0.15)
North America > United States > Virginia (0.14)

Genre: Research Report (1.00)

Industry: Energy (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Quantifying Multilingual Performance of Large Language Models Across Languages

Li, Zihao, Shi, Yucheng, Liu, Zirui, Yang, Fan, Payani, Ali, Liu, Ninghao, Du, Mengnan

arXiv.org Artificial IntelligenceJun-16-2024

The development of Large Language Models (LLMs) relies on extensive text corpora, which are often unevenly distributed across languages. This imbalance results in LLMs performing significantly better on high-resource languages like English, German, and French, while their capabilities in low-resource languages remain inadequate. Currently, there is a lack of quantitative methods to evaluate the performance of LLMs in these low-resource languages. To address this gap, we propose the Language Ranker, an intrinsic metric designed to benchmark and rank languages based on LLM performance using internal representations. By comparing the LLM's internal representation of various languages against a baseline derived from English, we can assess the model's multilingual capabilities in a robust and language-agnostic manner. Our analysis reveals that high-resource languages exhibit higher similarity scores with English, demonstrating superior performance, while low-resource languages show lower similarity scores, underscoring the effectiveness of our metric in assessing language-specific capabilities. Besides, the experiments show that there is a strong correlation between the LLM's performance in different languages and the proportion of those languages in its pre-training corpus. These insights underscore the efficacy of the Language Ranker as a tool for evaluating LLM performance across different languages, particularly those with limited resources.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2404.11553

Country: Europe (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Exploring Concept Depth: How Large Language Models Acquire Knowledge at Different Layers?

Jin, Mingyu, Yu, Qinkai, Huang, Jingyuan, Zeng, Qingcheng, Wang, Zhenting, Hua, Wenyue, Zhao, Haiyan, Mei, Kai, Meng, Yanda, Ding, Kaize, Yang, Fan, Du, Mengnan, Zhang, Yongfeng

arXiv.org Artificial IntelligenceApr-30-2024

Large language models (LLMs) have shown remarkable performances across a wide range of tasks. However, the mechanisms by which these models encode tasks of varying complexities remain poorly understood. In this paper, we explore the hypothesis that LLMs process concepts of varying complexities in different layers, introducing the idea of "Concept Depth" to suggest that more complex concepts are typically acquired in deeper layers. Specifically, we categorize concepts based on their level of abstraction, defining them in the order of increasing complexity within factual, emotional, and inferential tasks. We conduct extensive probing experiments using layer-wise representations across various LLM families (Gemma, LLaMA, QWen) on various datasets spanning the three domains of tasks. Our findings reveal that models could efficiently conduct probing for simpler tasks in shallow layers, and more complex tasks typically necessitate deeper layers for accurate understanding. Additionally, we examine how external factors, such as adding noise to the input and quantizing the model weights, might affect layer-wise representations. Our findings suggest that these factors can impede the development of a conceptual understanding of LLMs until deeper layers are explored. We hope that our proposed concept and experimental insights will enhance the understanding of the mechanisms underlying LLMs. Our codes are available at https://github.com/Luckfort/CD.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2404.07066

Country:

Asia (1.00)
Europe (0.93)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Media > Film (0.68)
Leisure & Entertainment (0.46)
Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Usable XAI: 10 Strategies Towards Exploiting Explainability in the LLM Era

Wu, Xuansheng, Zhao, Haiyan, Zhu, Yaochen, Shi, Yucheng, Yang, Fan, Liu, Tianming, Zhai, Xiaoming, Yao, Wenlin, Li, Jundong, Du, Mengnan, Liu, Ninghao

arXiv.org Artificial IntelligenceMar-13-2024

Explainable AI (XAI) refers to techniques that provide human-understandable insights into the workings of AI models. Recently, the focus of XAI is being extended towards Large Language Models (LLMs) which are often criticized for their lack of transparency. This extension calls for a significant transformation in XAI methodologies because of two reasons. First, many existing XAI methods cannot be directly applied to LLMs due to their complexity and advanced capabilities. Second, as LLMs are increasingly deployed across diverse industry applications, the role of XAI shifts from merely opening the "black box" to actively enhancing the productivity and applicability of LLMs in real-world settings. Meanwhile, unlike traditional machine learning models that are passive recipients of XAI insights, the distinct abilities of LLMs can reciprocally enhance XAI. Therefore, in this paper, we introduce Usable XAI in the context of LLMs by analyzing (1) how XAI can benefit LLMs and AI systems, and (2) how LLMs can contribute to the advancement of XAI. We introduce 10 strategies, introducing the key techniques for each and discussing their associated challenges. We also provide case studies to demonstrate how to obtain and leverage explanations.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2403.08946

Country:

North America > United States (1.00)
Europe (1.00)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Media (0.92)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

What if LLMs Have Different World Views: Simulating Alien Civilizations with LLM-based Agents

Jin, Mingyu, Wang, Beichen, Xue, Zhaoqian, Zhu, Suiyuan, Hua, Wenyue, Tang, Hua, Mei, Kai, Du, Mengnan, Zhang, Yongfeng

arXiv.org Artificial IntelligenceFeb-20-2024

In this study, we introduce "CosmoAgent," an innovative artificial intelligence framework utilizing Large Language Models (LLMs) to simulate complex interactions between human and extraterrestrial civilizations, with a special emphasis on Stephen Hawking's cautionary advice about not sending radio signals haphazardly into the universe. The goal is to assess the feasibility of peaceful coexistence while considering potential risks that could threaten well-intentioned civilizations. Employing mathematical models and state transition matrices, our approach quantitatively evaluates the development trajectories of civilizations, offering insights into future decision-making at critical points of growth and saturation. Furthermore, the paper acknowledges the vast diversity in potential living conditions across the universe, which could foster unique cosmologies, ethical codes, and worldviews among various civilizations. Recognizing the Earth-centric bias inherent in current LLM designs, we propose the novel concept of using LLMs with diverse ethical paradigms and simulating interactions between entities with distinct moral principles. This innovative research provides a new way to understand complex inter-civilizational dynamics, expanding our perspective while pioneering novel strategies for conflict resolution, crucial for preventing interstellar conflicts. We have also released the code and datasets to enable further academic investigation into this interesting area of research.

artificial intelligence, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2402.13184

Country:

Europe (0.28)
North America > United States (0.14)
Asia > China (0.14)

Genre:

Research Report > Experimental Study (0.88)
Research Report > New Finding (0.66)

Industry: Government (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback