AITopics | Song, Shezheng

Collaborating Authors

Song, Shezheng

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

How to Alleviate Catastrophic Forgetting in LLMs Finetuning? Hierarchical Layer-Wise and Element-Wise Regularization

Song, Shezheng, Xu, Hao, Ma, Jun, Li, Shasha, Peng, Long, Wan, Qian, Liu, Xiaodong, Yu, Jie

arXiv.org Artificial IntelligenceFeb-17-2025

Large Language Models (LLMs) exhibit strong general language capabilities. However, fine-tuning these models on domain-specific tasks often leads to catastrophic forgetting, where the model overwrites or loses essential knowledge acquired during pretraining. This phenomenon significantly limits the broader applicability of LLMs. To address this challenge, we propose a novel approach to compute the element-wise importance of model parameters crucial for preserving general knowledge during fine-tuning. Our method utilizes a dual-objective optimization strategy: (1) regularization loss based on element-wise parameter importance, which constrains the updates to parameters crucial for general knowledge; (2) cross-entropy loss to adapt to domain-specific tasks. Additionally, we introduce layer-wise coefficients to account for the varying contributions of different layers, dynamically balancing the dual-objective optimization. Extensive experiments on scientific, medical, and physical tasks using GPT-J and LLaMA-3 demonstrate that our approach mitigates catastrophic forgetting while enhancing model adaptability. Compared to previous methods, our solution is approximately 20 times faster and requires only 10-15% of the storage, highlighting the practical efficiency. The code will be released.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2501.13669

Genre: Research Report (1.00)

Industry: Health & Medicine > Consumer Health (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Rethinking the Residual Distribution of Locate-then-Editing Methods in Model Editing

Li, Xiaopeng, Wang, Shanwen, Li, Shasha, Song, Shezheng, Ji, Bin, Ma, Jun, Yu, Jie

arXiv.org Artificial IntelligenceFeb-5-2025

Model editing is a powerful technique for updating the knowledge of Large Language Models (LLMs). Locate-then-edit methods are a popular class of approaches that first identify the critical layers storing knowledge, then compute the residual of the last critical layer based on the edited knowledge, and finally perform multi-layer updates using a least-squares solution by evenly distributing the residual from the first critical layer to the last. Although these methods achieve promising results, they have been shown to degrade the original knowledge of LLMs. We argue that residual distribution leads to this issue. To explore this, we conduct a comprehensive analysis of residual distribution in locate-then-edit methods from both empirical and theoretical perspectives, revealing that residual distribution introduces editing errors, leading to inaccurate edits. To address this issue, we propose the Boundary Layer UpdatE (BLUE) strategy to enhance locate-then-edit methods. Sequential batch editing experiments on three LLMs and two datasets demonstrate that BLUE not only delivers an average performance improvement of 35.59\%, significantly advancing the state of the art in model editing, but also enhances the preservation of LLMs' general capabilities. Our code is available at https://github.com/xpq-tech/BLUE.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2502.03748

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

MOSABench: Multi-Object Sentiment Analysis Benchmark for Evaluating Multimodal Large Language Models Understanding of Complex Image

Song, Shezheng, He, Chengxiang, Li, Shasha, Zhao, Shan, Wang, Chengyu, Yan, Tianwei, Li, Xiaopeng, Wan, Qian, Ma, Jun, Yu, Jie, Mao, Xiaoguang

arXiv.org Artificial IntelligenceNov-25-2024

Multimodal large language models (MLLMs) have shown remarkable progress in high-level semantic tasks such as visual question answering, image captioning, and emotion recognition. However, despite advancements, there remains a lack of standardized benchmarks for evaluating MLLMs performance in multi-object sentiment analysis, a key task in semantic understanding. To address this gap, we introduce MOSABench, a novel evaluation dataset designed specifically for multi-object sentiment analysis. MOSABench includes approximately 1,000 images with multiple objects, requiring MLLMs to independently assess the sentiment of each object, thereby reflecting real-world complexities. Key innovations in MOSABench include distance-based target annotation, post-processing for evaluation to standardize outputs, and an improved scoring mechanism. Our experiments reveal notable limitations in current MLLMs: while some models, like mPLUG-owl and Qwen-VL2, demonstrate effective attention to sentiment-relevant features, others exhibit scattered focus and performance declines, especially as the spatial distance between objects increases. This research underscores the need for MLLMs to enhance accuracy in complex, multi-object sentiment analysis tasks and establishes MOSABench as a foundational tool for advancing sentiment analysis capabilities in MLLMs.

large language model, machine learning, sentiment analysis, (19 more...)

arXiv.org Artificial Intelligence

2412.0006

Country: Asia > China > Hunan Province (0.14)

Genre: Research Report (1.00)

Industry:

Education (0.68)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Exploring Large Language Models for Multimodal Sentiment Analysis: Challenges, Benchmarks, and Future Directions

Song, Shezheng

arXiv.org Artificial IntelligenceNov-22-2024

Multimodal Aspect-Based Sentiment Analysis (MABSA) aims to extract aspect terms and their corresponding sentiment polarities from multimodal information, including text and images. While traditional supervised learning methods have shown effectiveness in this task, the adaptability of large language models (LLMs) to MABSA remains uncertain. Recent advances in LLMs, such as Llama2, LLaVA, and ChatGPT, demonstrate strong capabilities in general tasks, yet their performance in complex and fine-grained scenarios like MABSA is underexplored. In this study, we conduct a comprehensive investigation into the suitability of LLMs for MABSA. To this end, we construct a benchmark to evaluate the performance of LLMs on MABSA tasks and compare them with state-of-the-art supervised learning methods. Our experiments reveal that, while LLMs demonstrate potential in multimodal understanding, they face significant challenges in achieving satisfactory results for MABSA, particularly in terms of accuracy and inference time. Based on these findings, we discuss the limitations of current LLMs and outline directions for future research to enhance their capabilities in multimodal sentiment analysis.

large language model, multimodal sentiment analysis, natural language, (2 more...)

arXiv.org Artificial Intelligence

2411.15408

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Identifying Knowledge Editing Types in Large Language Models

Li, Xiaopeng, Wang, Shangwen, Song, Shezheng, Ji, Bin, Liu, Huijun, Li, Shasha, Ma, Jun, Yu, Jie

arXiv.org Artificial IntelligenceOct-1-2024

Knowledge editing has emerged as an efficient technology for updating the knowledge of large language models (LLMs), attracting increasing attention in recent years. However, there is a lack of effective measures to prevent the malicious misuse of this technology, which could lead to harmful edits in LLMs. These malicious modifications could cause LLMs to generate toxic content, misleading users into inappropriate actions. In front of this risk, we introduce a new task, Knowledge Editing Type Identification (KETI), aimed at identifying different types of edits in LLMs, thereby providing timely alerts to users when encountering illicit edits. As part of this task, we propose KETIBench, which includes five types of harmful edits covering most popular toxic types, as well as one benign factual edit. We develop four classical classification models and three BERT-based models as baseline identifiers for both open-source and closedsource LLMs. Our experimental results, across 42 trials involving two models and three knowledge editing methods, demonstrate that all seven baseline identifiers achieve decent identification performance, highlighting the feasibility of identifying malicious edits in LLMs. Additional analyses reveal that the performance of the identifiers is independent of the reliability of the knowledge editing methods and exhibits cross-domain generalization, enabling the identification of edits from unknown sources. All data and code are available in https://github.com/xpq-tech/KETI. Warning: This paper contains examples of toxic text. Knowledge editing is an emerging technology designed to efficiently rectify errors or outdated knowledge in large language models (LLMs) (Yao et al., 2023). In recent years, it has garnered increasing attention (Wang et al., 2023).

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2409.19663

Country:

North America > United States (0.14)
Europe > Italy (0.14)
Europe > Belgium (0.14)

Genre: Research Report > New Finding (0.68)

Industry: Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

DIM: Dynamic Integration of Multimodal Entity Linking with Large Language Model

Song, Shezheng, Li, Shasha, Yu, Jie, Zhao, Shan, Li, Xiaopeng, Ma, Jun, Liu, Xiaodong, Li, Zhuo, Mao, Xiaoguang

arXiv.org Artificial IntelligenceJun-27-2024

Our study delves into Multimodal Entity Linking, aligning the mention in multimodal information with entities in knowledge base. Existing methods are still facing challenges like ambiguous entity representations and limited image information utilization. Thus, we propose dynamic entity extraction using ChatGPT, which dynamically extracts entities and enhances datasets. We also propose a method: Dynamically Integrate Multimodal information with knowledge base (DIM), employing the capability of the Large Language Model (LLM) for visual understanding. The LLM, such as BLIP-2, extracts information relevant to entities in the image, which can facilitate improved extraction of entity features and linking them with the dynamic entity representations provided by ChatGPT. The experiments demonstrate that our proposed DIM method outperforms the majority of existing methods on the three original datasets, and achieves state-of-the-art (SOTA) on the dynamically enhanced datasets (Wiki+, Rich+, Diverse+).

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2407.12019

Country: North America > United States > Pennsylvania (0.14)

Genre: Research Report (0.50)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Media > Music (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

PTA: Enhancing Multimodal Sentiment Analysis through Pipelined Prediction and Translation-based Alignment

Song, Shezheng, Li, Shasha, Zhao, Shan, Wang, Chengyu, Li, Xiaopeng, Yu, Jie, Wan, Qian, Ma, Jun, Yan, Tianwei, Ma, Wentao, Mao, Xiaoguang

arXiv.org Artificial IntelligenceJun-13-2024

Multimodal aspect-based sentiment analysis (MABSA) aims to understand opinions in a granular manner, advancing human-computer interaction and other fields. Traditionally, MABSA methods use a joint prediction approach to identify aspects and sentiments simultaneously. However, we argue that joint models are not always superior. Our analysis shows that joint models struggle to align relevant text tokens with image patches, leading to misalignment and ineffective image utilization. In contrast, a pipeline framework first identifies aspects through MATE (Multimodal Aspect Term Extraction) and then aligns these aspects with image patches for sentiment classification (MASC: Multimodal Aspect-Oriented Sentiment Classification). This method is better suited for multimodal scenarios where effective image use is crucial. We present three key observations: (a) MATE and MASC have different feature requirements, with MATE focusing on token-level features and MASC on sequence-level features; (b) the aspect identified by MATE is crucial for effective image utilization; and (c) images play a trivial role in previous MABSA methods due to high noise. Based on these observations, we propose a pipeline framework that first predicts the aspect and then uses translation-based alignment (TBA) to enhance multimodal semantic consistency for better image utilization. Our method achieves state-of-the-art (SOTA) performance on widely used MABSA datasets Twitter-15 and Twitter-17. This demonstrates the effectiveness of the pipeline approach and its potential to provide valuable insights for future MABSA research. For reproducibility, the code and checkpoint will be released.

information, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2406.00017

Country:

North America > United States (0.14)
Asia > China (0.14)
North America > Canada (0.14)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.32)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
(2 more...)

Add feedback

DWE+: Dual-Way Matching Enhanced Framework for Multimodal Entity Linking

Song, Shezheng, Li, Shasha, Zhao, Shan, Li, Xiaopeng, Wang, Chengyu, Yu, Jie, Ma, Jun, Yan, Tianwei, Ji, Bin, Mao, Xiaoguang

arXiv.org Artificial IntelligenceApr-7-2024

Multimodal entity linking (MEL) aims to utilize multimodal information (usually textual and visual information) to link ambiguous mentions to unambiguous entities in knowledge base. Current methods facing main issues: (1)treating the entire image as input may contain redundant information. (2)the insufficient utilization of entity-related information, such as attributes in images. (3)semantic inconsistency between the entity in knowledge base and its representation. To this end, we propose DWE+ for multimodal entity linking. DWE+ could capture finer semantics and dynamically maintain semantic consistency with entities. This is achieved by three aspects: (a)we introduce a method for extracting fine-grained image features by partitioning the image into multiple local objects. Then, hierarchical contrastive learning is used to further align semantics between coarse-grained information(text and image) and fine-grained (mention and visual objects). (b)we explore ways to extract visual attributes from images to enhance fusion feature such as facial features and identity. (c)we leverage Wikipedia and ChatGPT to capture the entity representation, achieving semantic enrichment from both static and dynamic perspectives, which better reflects the real-world entity semantics. Experiments on Wikimel, Richpedia, and Wikidiverse datasets demonstrate the effectiveness of DWE+ in improving MEL performance. Specifically, we optimize these datasets and achieve state-of-the-art performance on the enhanced datasets. The code and enhanced datasets are released on https://github.com/season1blue/DWET

information, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2404.04818

Country:

North America > United States (1.00)
Europe (1.00)
Asia > China > Hunan Province (0.28)

Genre: Research Report (0.82)

Industry: Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
(4 more...)

Add feedback

SWEA: Changing Factual Knowledge in Large Language Models via Subject Word Embedding Altering

Li, Xiaopeng, Li, Shasha, Ji, Bin, Song, Shezheng, Wang, Xi, Ma, Jun, Yu, Jie, Liu, Xiaodong, Wang, Jing, Zhang, Weimin

arXiv.org Artificial IntelligenceJan-31-2024

Model editing has recently gained widespread attention. Current model editing methods primarily involve modifying model parameters or adding additional modules to the existing model. However, the former causes irreversible damage to LLMs, while the latter incurs additional inference overhead and fuzzy vector matching is not always reliable. To address these issues, we propose an expandable Subject Word Embedding Altering (SWEA) framework, which modifies the representation of subjects and achieve the goal of editing knowledge during the inference stage. SWEA uses precise key matching outside the model and performs reliable subject word embedding altering, thus protecting the original weights of the model without increasing inference overhead. We then propose optimizing then suppressing fusion method, which first optimizes the embedding vector for the editing target and then suppresses the Knowledge Embedding Dimension (KED) to obtain the final fused embedding. We thus propose SWEAOS method for editing factual knowledge in LLMs. We demonstrate the state-of-the-art performance of SWEAOS on the COUNTERFACT and zsRE datasets. To further validate the reasoning ability of SWEAOS in editing knowledge, we evaluate it on the more complex RIPPLEEDITS benchmark. The results on two subdatasets demonstrate that our SWEAOS possesses state-of-the-art reasoning ability.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2401.17809

Country:

Asia > China (0.16)
Europe > Denmark (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

PMET: Precise Model Editing in a Transformer

Li, Xiaopeng, Li, Shasha, Song, Shezheng, Yang, Jing, Ma, Jun, Yu, Jie

arXiv.org Artificial IntelligenceDec-19-2023

Model editing techniques modify a minor proportion of knowledge in Large Language Models (LLMs) at a relatively low cost, which have demonstrated notable success. Existing methods assume Transformer Layer (TL) hidden states are values of key-value memories of the Feed-Forward Network (FFN). They usually optimize the TL hidden states to memorize target knowledge and use it to update the weights of the FFN in LLMs. However, the information flow of TL hidden states comes from three parts: Multi-Head Self-Attention (MHSA), FFN, and residual connections. Existing methods neglect the fact that the TL hidden states contains information not specifically required for FFN. Consequently, the performance of model editing decreases. To achieve more precise model editing, we analyze hidden states of MHSA and FFN, finding that MHSA encodes certain general knowledge extraction patterns. This implies that MHSA weights do not require updating when new knowledge is introduced. Based on above findings, we introduce PMET, which simultaneously optimizes Transformer Component (TC, namely MHSA and FFN) hidden states, while only using the optimized TC hidden states of FFN to precisely update FFN weights. Our experiments demonstrate that PMET exhibits state-of-the-art performance on both the COUNTERFACT and zsRE datasets. Our ablation experiments substantiate the effectiveness of our enhancements, further reinforcing the finding that the MHSA encodes certain general knowledge extraction patterns and indicating its storage of a small amount of factual knowledge. Our code is available at https://github.com/xpq-tech/PMET.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2308.08742

Country:

Asia > China (0.28)
North America > United States (0.28)
Europe > United Kingdom (0.28)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback