AITopics | Sun, Yajing

Collaborating Authors

Sun, Yajing

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Enhancing Semantic Consistency of Large Language Models through Model Editing: An Interpretability-Oriented Approach

Yang, Jingyuan, Chen, Dapeng, Sun, Yajing, Li, Rongjun, Feng, Zhiyong, Peng, Wei

arXiv.org Artificial IntelligenceJan-19-2025

A Large Language Model (LLM) tends to generate inconsistent and sometimes contradictory outputs when presented with a prompt that has equivalent semantics but is expressed differently from the original prompt. To achieve semantic consistency of an LLM, one of the key approaches is to finetune the model with prompt-output pairs with semantically equivalent meanings. Despite its effectiveness, a data-driven finetuning method incurs substantial computation costs in data preparation and model optimization. In this regime, an LLM is treated as a ``black box'', restricting our ability to gain deeper insights into its internal mechanism. In this paper, we are motivated to enhance the semantic consistency of LLMs through a more interpretable method (i.e., model editing) to this end. We first identify the model components (i.e., attention heads) that have a key impact on the semantic consistency of an LLM. We subsequently inject biases into the output of these model components along the semantic-consistency activation direction. It is noteworthy that these modifications are cost-effective, without reliance on mass manipulations of the original model parameters. Through comprehensive experiments on the constructed NLU and open-source NLG datasets, our method demonstrates significant improvements in the semantic consistency and task performance of LLMs. Additionally, our method exhibits promising generalization capabilities by performing well on tasks beyond the primary tasks.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2501.11041

Country:

Asia (0.46)
North America > United States (0.28)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Knowledge-augmented Frame Semantic Parsing with Hybrid Prompt-tuning

Zhang, Rui, Sun, Yajing, Yang, Jingyuan, Peng, Wei

arXiv.org Artificial IntelligenceMar-25-2023

Frame semantics-based approaches have been widely used in semantic parsing tasks and have become mainstream. It remains challenging to disambiguate frame representations evoked by target lexical units under different contexts. Pre-trained Language Models (PLMs) have been used in semantic parsing and significantly improve the accuracy of neural parsers. However, the PLMs-based approaches tend to favor collocated patterns presented in the training data, leading to inaccurate outcomes. The intuition here is to design a mechanism to optimally use knowledge captured in semantic frames in conjunction with PLMs to disambiguate frames. We propose a novel Knowledge-Augmented Frame Semantic Parsing Architecture (KAF-SPA) to enhance semantic representation by incorporating accurate frame knowledge into PLMs during frame semantic parsing. Specifically, a Memory-based Knowledge Extraction Module (MKEM) is devised to select accurate frame knowledge and construct the continuous templates in the high dimensional vector space. Moreover, we design a Task-oriented Knowledge Probing Module (TKPM) using hybrid prompts (in terms of continuous and discrete prompts) to incorporate the selected knowledge into the PLMs and adapt PLMs to the tasks of frame and argument identification. Experimental results on two public FrameNet datasets demonstrate that our method significantly outperforms strong baselines (by more than +3$\%$ in F1), achieving state-of-art results on the current benchmark. Ablation studies verify the effectiveness of KAF-SPA.

artificial intelligence, knowledge, natural language, (16 more...)

arXiv.org Artificial Intelligence

2303.14375

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Add feedback

Know Deeper: Knowledge-Conversation Cyclic Utilization Mechanism for Open-domain Dialogue Generation

Sun, Yajing, Hu, Yue, Xing, Luxi, Xie, Yuqiang, Wei, Xiangpeng

arXiv.org Artificial IntelligenceJul-16-2021

End-to-End intelligent neural dialogue systems suffer from the problems of generating inconsistent and repetitive responses. Existing dialogue models pay attention to unilaterally incorporating personal knowledge into the dialog while ignoring the fact that incorporating the personality-related conversation information into personal knowledge taken as the bilateral information flow boosts the quality of the subsequent conversation. Besides, it is indispensable to control personal knowledge utilization over the conversation level. In this paper, we propose a conversation-adaption multi-view persona aware response generation model that aims at enhancing conversation consistency and alleviating the repetition from two folds. First, we consider conversation consistency from multiple views. From the view of the persona profile, we design a novel interaction module that not only iteratively incorporates personalized knowledge into each turn conversation but also captures the personality-related information from conversation to enhance personalized knowledge semantic representation. From the view of speaking style, we introduce the speaking style vector and feed it into the decoder to keep the speaking style consistency. To avoid conversation repetition, we devise a coverage mechanism to keep track of the activation of personal knowledge utilization. Experiments on both automatic and human evaluation verify the superiority of our model over previous models.

deep learning, knowledge, neural network, (22 more...)

arXiv.org Artificial Intelligence

2107.07771

Country:

North America > United States > Louisiana (0.14)
Asia > Middle East > Qatar (0.14)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Mucko: Multi-Layer Cross-Modal Knowledge Reasoning for Fact-based Visual Question Answering

Zhu, Zihao, Yu, Jing, Wang, Yujing, Sun, Yajing, Hu, Yue, Wu, Qi

arXiv.org Artificial IntelligenceJun-16-2020

Fact-based Visual Question Answering (FVQA) requires external knowledge beyond visible content to answer questions about an image, which is challenging but indispensable to achieve general VQA. One limitation of existing FVQA solutions is that they jointly embed all kinds of information without fine-grained selection, which introduces unexpected noises for reasoning the final answer. How to capture the question-oriented and information-complementary evidence remains a key challenge to solve the problem. In this paper, we depict an image by a multi-modal heterogeneous graph, which contains multiple layers of information corresponding to the visual, semantic and factual features. On top of the multi-layer graph representations, we propose a modality-aware heterogeneous graph convolutional network to capture evidence from different layers that is most relevant to the given question. Specifically, the intra-modal graph convolution selects evidence from each modality and cross-modal graph convolution aggregates relevant information across different modalities. By stacking this process multiple times, our model performs iterative reasoning and predicts the optimal answer by analyzing all question-oriented evidence. We achieve a new state-of-the-art performance on the FVQA task and demonstrate the effectiveness and interpretability of our model with extensive experiments. The code is available at https://github.com/astro-zihao/mucko.

artificial intelligence, graph, text processing, (18 more...)

arXiv.org Artificial Intelligence

2006.09073

Country: Asia (0.46)

Genre: Research Report > Promising Solution (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.86)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.69)

Add feedback