AITopics | Mao, Shengyu

Collaborating Authors

Mao, Shengyu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

RaFe: Ranking Feedback Improves Query Rewriting for RAG

Mao, Shengyu, Jiang, Yong, Chen, Boli, Li, Xiao, Wang, Peng, Wang, Xinyu, Xie, Pengjun, Huang, Fei, Chen, Huajun, Zhang, Ningyu

arXiv.org Artificial IntelligenceMay-23-2024

As Large Language Models (LLMs) and Retrieval Augmentation Generation (RAG) techniques have evolved, query rewriting has been widely incorporated into the RAG system for downstream tasks like open-domain QA. Many works have attempted to utilize small models with reinforcement learning rather than costly LLMs to improve query rewriting. However, current methods require annotations (e.g., labeled relevant documents or downstream answers) or predesigned rewards for feedback, which lack generalization, and fail to utilize signals tailored for query rewriting. In this paper, we propose ours, a framework for training query rewriting models free of annotations. By leveraging a publicly available reranker, ours~provides feedback aligned well with the rewriting objectives. Experimental results demonstrate that ours~can obtain better performance than baselines.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2405.14431

Country:

Europe (0.67)
North America > Canada (0.28)
North America > United States (0.28)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)

Genre: Research Report > New Finding (0.48)

Industry: Education (0.46)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Editing Conceptual Knowledge for Large Language Models

Wang, Xiaohan, Mao, Shengyu, Zhang, Ningyu, Deng, Shumin, Yao, Yunzhi, Shen, Yue, Liang, Lei, Gu, Jinjie, Chen, Huajun

arXiv.org Artificial IntelligenceMar-10-2024

Recently, there has been a growing interest in knowledge editing for Large Language Models (LLMs). Current approaches and evaluations merely explore the instance-level editing, while whether LLMs possess the capability to modify concepts remains unclear. This paper pioneers the investigation of editing conceptual knowledge for LLMs, by constructing a novel benchmark dataset ConceptEdit and establishing a suite of new metrics for evaluation. The experimental results reveal that, although existing editing methods can efficiently modify concept-level definition to some extent, they also have the potential to distort the related instantial knowledge in LLMs, leading to poor performance. We anticipate this can inspire further progress in better understanding LLMs. Our project homepage is available at https://zjunlp.github.io/project/ConceptEdit.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2403.06259

Country:

Asia > Middle East > UAE (0.14)
North America > United States > Maryland (0.14)

Genre: Research Report > New Finding (0.93)

Industry: Leisure & Entertainment > Sports (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

A Comprehensive Study of Knowledge Editing for Large Language Models

Zhang, Ningyu, Yao, Yunzhi, Tian, Bozhong, Wang, Peng, Deng, Shumin, Wang, Mengru, Xi, Zekun, Mao, Shengyu, Zhang, Jintian, Ni, Yuansheng, Cheng, Siyuan, Xu, Ziwen, Xu, Xin, Gu, Jia-Chen, Jiang, Yong, Xie, Pengjun, Huang, Fei, Liang, Lei, Zhang, Zhiqiang, Zhu, Xiaowei, Zhou, Jun, Chen, Huajun

arXiv.org Artificial IntelligenceJan-9-2024

Large Language Models (LLMs) have shown extraordinary capabilities in understanding and generating text that closely mirrors human communication. However, a primary limitation lies in the significant computational demands during training, arising from their extensive parameterization. This challenge is further intensified by the dynamic nature of the world, necessitating frequent updates to LLMs to correct outdated information or integrate new knowledge, thereby ensuring their continued relevance. Note that many applications demand continual model adjustments post-training to address deficiencies or undesirable behaviors. There is an increasing interest in efficient, lightweight methods for on-the-fly model modifications. To this end, recent years have seen a burgeoning in the techniques of knowledge editing for LLMs, which aim to efficiently modify LLMs' behaviors within specific domains while preserving overall performance across various inputs. In this paper, we first define the knowledge editing problem and then provide a comprehensive review of cutting-edge approaches. Drawing inspiration from educational and cognitive research theories, we propose a unified categorization criterion that classifies knowledge editing methods into three groups: resorting to external knowledge, merging knowledge into the model, and editing intrinsic knowledge. Furthermore, we introduce a new benchmark, KnowEdit, for a comprehensive empirical evaluation of representative knowledge editing approaches. Additionally, we provide an in-depth analysis of knowledge location, which can give a deeper understanding of the knowledge structures inherent within LLMs. Finally, we discuss several potential applications of knowledge editing, outlining its broad and impactful implications.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2401.01286

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)
Research Report > Promising Solution (0.66)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (0.92)
Leisure & Entertainment (0.92)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Editing Personality for LLMs

Mao, Shengyu, Zhang, Ningyu, Wang, Xiaohan, Wang, Mengru, Yao, Yunzhi, Jiang, Yong, Xie, Pengjun, Huang, Fei, Chen, Huajun

arXiv.org Artificial IntelligenceNov-21-2023

This paper introduces an innovative task focused on editing the personality traits of Large Language Models (LLMs). This task seeks to adjust the models' responses to opinion-related questions on specified topics since an individual's personality often manifests in the form of their expressed opinions, thereby showcasing different personality traits. Specifically, we construct a new benchmark dataset PersonalityEdit to address this task. Drawing on the theory in Social Psychology, we isolate three representative traits, namely Neuroticism, Extraversion, and Agreeableness, as the foundation for our benchmark. We then gather data using GPT-4, generating responses that not only align with a specified topic but also embody the targeted personality trait. We conduct comprehensive experiments involving various baselines and discuss the representation of personality behavior in LLMs. Our intriguing findings uncover potential challenges of the proposed task, illustrating several remaining issues. We anticipate that our work can provide the NLP community with insights. Code and datasets will be released at https://github.com/zjunlp/EasyEdit.

large language model, machine learning, personality trait, (18 more...)

arXiv.org Artificial Intelligence

2310.02168

Country:

North America > United States (1.00)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
Asia > Russia > Siberian Federal District > Krasnoyarsk Krai (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Add feedback

Knowledge Rumination for Pre-trained Language Models

Yao, Yunzhi, Wang, Peng, Mao, Shengyu, Tan, Chuanqi, Huang, Fei, Chen, Huajun, Zhang, Ningyu

arXiv.org Artificial IntelligenceOct-11-2023

Previous studies have revealed that vanilla pre-trained language models (PLMs) lack the capacity to handle knowledge-intensive NLP tasks alone; thus, several works have attempted to integrate external knowledge into PLMs. However, despite the promising outcome, we empirically observe that PLMs may have already encoded rich knowledge in their pre-trained parameters but fail to fully utilize them when applying them to knowledge-intensive tasks. In this paper, we propose a new paradigm dubbed Knowledge Rumination to help the pre-trained language model utilize that related latent knowledge without retrieving it from the external corpus. By simply adding a prompt like "As far as I know" to the PLMs, we try to review related latent knowledge and inject them back into the model for knowledge consolidation. We apply the proposed knowledge rumination to various language models, including RoBERTa, DeBERTa, and GPT-3. Experimental results on six commonsense reasoning tasks and GLUE benchmarks demonstrate the effectiveness of our proposed approach, which proves that the knowledge stored in PLMs can be better exploited to enhance performance. Code is available in https://github.com/zjunlp/knowledge-rumination.

large language model, machine learning, natural language, (5 more...)

arXiv.org Artificial Intelligence

2305.08732

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

SPEECH: Structured Prediction with Energy-Based Event-Centric Hyperspheres

Deng, Shumin, Mao, Shengyu, Zhang, Ningyu, Hooi, Bryan

arXiv.org Artificial IntelligenceSep-18-2023

Event-centric structured prediction involves predicting structured outputs of events. In most NLP cases, event structures are complex with manifold dependency, and it is challenging to effectively represent these complicated structured events. To address these issues, we propose Structured Prediction with Energy-based Event-Centric Hyperspheres (SPEECH). SPEECH models complex dependency among event structured components with energy-based modeling, and represents event classes with simple but effective hyperspheres. Experiments on two unified-annotated event datasets indicate that SPEECH is predominant in event detection and event-relation extraction tasks.

artificial intelligence, classification, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2305.13617

Country: Asia (0.14)

Genre: Research Report (0.82)

Industry: Law Enforcement & Public Safety (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.84)

Add feedback

Schema-aware Reference as Prompt Improves Data-Efficient Knowledge Graph Construction

Yao, Yunzhi, Mao, Shengyu, Zhang, Ningyu, Chen, Xiang, Deng, Shumin, Chen, Xi, Chen, Huajun

arXiv.org Artificial IntelligenceSep-18-2023

With the development of pre-trained language models, many prompt-based approaches to data-efficient knowledge graph construction have been proposed and achieved impressive performance. However, existing prompt-based learning methods for knowledge graph construction are still susceptible to several potential limitations: (i) semantic gap between natural language and output structured knowledge with pre-defined schema, which means model cannot fully exploit semantic knowledge with the constrained templates; (ii) representation learning with locally individual instances limits the performance given the insufficient features, which are unable to unleash the potential analogical capability of pre-trained language models. Motivated by these observations, we propose a retrieval-augmented approach, which retrieves schema-aware Reference As Prompt (RAP), for data-efficient knowledge graph construction. It can dynamically leverage schema and knowledge inherited from human-annotated and weak-supervised data as a prompt for each sample, which is model-agnostic and can be plugged into widespread existing approaches. Experimental results demonstrate that previous methods integrated with RAP can achieve impressive performance gains in low-resource settings on five datasets of relational triple extraction and event extraction for knowledge graph construction. Code is available in https://github.com/zjunlp/RAP.

computational linguistic, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3539618.3591763

2210.10709

Country:

Europe (1.00)
Asia > China > Zhejiang Province (0.14)
North America > United States > Maryland (0.14)
(2 more...)

Genre: Research Report > New Finding (0.48)

Industry: Law (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback