AITopics | Jing, Liqiang

Collaborating Authors

Jing, Liqiang

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Fine-grained and Explainable Factuality Evaluation for Multimodal Summarization

Zhang, Yue, Zuo, Jingxuan, Jing, Liqiang

arXiv.org Artificial IntelligenceDec-27-2024

Multimodal summarization aims to generate a concise summary based on the input text and image. However, the existing methods potentially suffer from unfactual output. To evaluate the factuality of multimodal summarization models, we propose two fine-grained and explainable evaluation frameworks (FALLACIOUS) for different application scenarios, i.e. reference-based factuality evaluation framework and reference-free factuality evaluation framework. Notably, the reference-free factuality evaluation framework doesn't need ground truth and hence it has a wider application scenario. To evaluate the effectiveness of the proposed frameworks, we compute the correlation between our frameworks and the other metrics. The experimental results show the effectiveness of our proposed method. We will release our code and dataset via github.

computational linguistic, machine learning, question answering, (15 more...)

arXiv.org Artificial Intelligence

2402.11414

Country:

Europe (1.00)
North America > United States > Hawaii (0.14)
Asia > Middle East > UAE (0.14)

Genre: Research Report (0.70)

Industry: Government > Military (0.32)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.50)

Add feedback

Learning to Generate Research Idea with Dynamic Control

Li, Ruochen, Jing, Liqiang, Han, Chi, Zhou, Jiawei, Du, Xinya

arXiv.org Artificial IntelligenceDec-19-2024

The rapid advancements in large language models (LLMs) have demonstrated their potential to accelerate scientific discovery, particularly in automating the process of research ideation. LLM-based systems have shown promise in generating hypotheses and research ideas. However, current approaches predominantly rely on prompting-based pre-trained models, limiting their ability to optimize generated content effectively. Moreover, they also lack the capability to deal with the complex interdependence and inherent restrictions among novelty, feasibility, and effectiveness, which remains challenging due to the inherent trade-offs among these dimensions, such as the innovation-feasibility conflict. To address these limitations, we for the first time propose fine-tuning LLMs to be better idea proposers and introduce a novel framework that employs a two-stage approach combining Supervised Fine-Tuning (SFT) and controllable Reinforcement Learning (RL). In the SFT stage, the model learns foundational patterns from pairs of research papers and follow-up ideas. In the RL stage, multi-dimensional reward modeling, guided by fine-grained feedback, evaluates and optimizes the generated ideas across key metrics. Dimensional controllers enable dynamic adjustment of generation, while a sentence-level decoder ensures context-aware emphasis during inference. Our framework provides a balanced approach to research ideation, achieving high-quality outcomes by dynamically navigating the trade-offs among novelty, feasibility, and effectiveness.

large language model, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2412.14626

Genre:

Research Report > Experimental Study (0.68)
Research Report > Promising Solution (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Defeasible Visual Entailment: Benchmark, Evaluator, and Reward-Driven Optimization

Zhang, Yue, Jing, Liqiang, Gogate, Vibhav

arXiv.org Artificial IntelligenceDec-18-2024

We introduce a new task called Defeasible Visual Entailment (DVE), where the goal is to allow the modification of the entailment relationship between an image premise and a text hypothesis based on an additional update. While this concept is well-established in Natural Language Inference, it remains unexplored in visual entailment. At a high level, DVE enables models to refine their initial interpretations, leading to improved accuracy and reliability in various applications such as detecting misleading information in images, enhancing visual question answering, and refining decision-making processes in autonomous systems. Existing metrics do not adequately capture the change in the entailment relationship brought by updates. To address this, we propose a novel inference-aware evaluator designed to capture changes in entailment strength induced by updates, using pairwise contrastive learning and categorical information learning. Additionally, we introduce a reward-driven update optimization method to further enhance the quality of updates generated by multimodal models. Experimental results demonstrate the effectiveness of our proposed evaluator and optimization method.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2412.16232

Country: North America (0.46)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

A Unified Hallucination Mitigation Framework for Large Vision-Language Models

Chang, Yue, Jing, Liqiang, Zhang, Xiaopeng, Zhang, Yue

arXiv.org Artificial IntelligenceSep-24-2024

Hallucination is a common problem for Large Vision-Language Models (LVLMs) with long generations which is difficult to eradicate. The generation with hallucinations is partially inconsistent with the image content. To mitigate hallucination, current studies either focus on the process of model inference or the results of model generation, but the solutions they design sometimes do not deal appropriately with various types of queries and the hallucinations of the generations about these queries. To accurately deal with various hallucinations, we present a unified framework, Dentist, for hallucination mitigation. The core step is to first classify the queries, then perform different processes of hallucination mitigation based on the classification result, just like a dentist first observes the teeth and then makes a plan. In a simple deployment, Dentist can classify queries as perception or reasoning and easily mitigate potential hallucinations in answers which has been demonstrated in our experiments.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2409.16494

Country:

North America (0.28)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine > Therapeutic Area (0.81)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

FGAIF: Aligning Large Vision-Language Models with Fine-grained AI Feedback

Jing, Liqiang, Du, Xinya

arXiv.org Artificial IntelligenceApr-7-2024

Large Vision-Language Models (LVLMs) have demonstrated proficiency in tackling a variety of visual-language tasks. However, current LVLMs suffer from misalignment between text and image modalities which causes three kinds of hallucination problems, i.e., object existence, object attribute, and object relationship. To tackle this issue, existing methods mainly utilize Reinforcement Learning (RL) to align modalities in LVLMs. However, they still suffer from three main limitations: (1) General feedback can not indicate the hallucination type contained in the response; (2) Sparse rewards only give the sequence-level reward for the whole response; and (3)Annotation cost is time-consuming and labor-intensive. To handle these limitations, we propose an innovative method to align modalities in LVLMs through Fine-Grained Artificial Intelligence Feedback (FGAIF), which mainly consists of three steps: AI-based Feedback Collection, Fine-grained Reward Model Training, and Reinforcement Learning with Fine-grained Reward. Specifically, We first utilize AI tools to predict the types of hallucination for each segment in the response and obtain a collection of fine-grained feedback. Then, based on the collected reward data, three specialized reward models are trained to produce dense rewards. Finally, a novel fine-grained feedback module is integrated into the Proximal Policy Optimization (PPO) algorithm. Extensive experiments are conducted on hallucination and general benchmarks, demonstrating the superior performance of our proposed method. Notably, compared with previous models trained with the RL-based aligning method, our proposed method is effective even with fewer parameters.

large language model, machine learning, reinforcement learning, (20 more...)

arXiv.org Artificial Intelligence

2404.05046

Country: North America > United States > North Carolina (0.14)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.46)

Add feedback

Sentiment-enhanced Graph-based Sarcasm Explanation in Dialogue

Ouyang, Kun, Jing, Liqiang, Song, Xuemeng, Liu, Meng, Hu, Yupeng, Nie, Liqiang

arXiv.org Artificial IntelligenceFeb-5-2024

Sarcasm Explanation in Dialogue (SED) is a new yet challenging task, which aims to generate a natural language explanation for the given sarcastic dialogue that involves multiple modalities (i.e., utterance, video, and audio). Although existing studies have achieved great success based on the generative pretrained language model BART, they overlook exploiting the sentiments residing in the utterance, video and audio, which are vital clues for sarcasm explanation. In fact, it is non-trivial to incorporate sentiments for boosting SED performance, due to three main challenges: 1) diverse effects of utterance tokens on sentiments; 2) gap between video-audio sentiment signals and the embedding space of BART; and 3) various relations among utterances, utterance sentiments, and video-audio sentiments. To tackle these challenges, we propose a novel sEntiment-enhanceD Graph-based multimodal sarcasm Explanation framework, named EDGE. In particular, we first propose a lexicon-guided utterance sentiment inference module, where a heuristic utterance sentiment refinement strategy is devised. We then develop a module named Joint Cross Attention-based Sentiment Inference (JCA-SI) by extending the multimodal sentiment analysis model JCA to derive the joint sentiment label for each video-audio clip. Thereafter, we devise a context-sentiment graph to comprehensively model the semantic relations among the utterances, utterance sentiments, and video-audio sentiments, to facilitate sarcasm explanation generation. Extensive experiments on the publicly released dataset WITS verify the superiority of our model over cutting-edge methods.

machine learning, natural language, sentiment, (21 more...)

arXiv.org Artificial Intelligence

2402.03658

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Debiasing Multimodal Sarcasm Detection with Contrastive Learning

Jia, Mengzhao, Xie, Can, Jing, Liqiang

arXiv.org Artificial IntelligenceDec-19-2023

Despite commendable achievements made by existing work, prevailing multimodal sarcasm detection studies rely more on textual content over visual information. It unavoidably induces spurious correlations between textual words and labels, thereby significantly hindering the models' generalization capability. To address this problem, we define the task of out-of-distribution (OOD) multimodal sarcasm detection, which aims to evaluate models' generalizability when the word distribution is different in training and testing settings. Moreover, we propose a novel debiasing multimodal sarcasm detection framework with contrastive learning, which aims to mitigate the harmful effect of biased textual factors for robust OOD generalization. In particular, we first design counterfactual data augmentation to construct the positive samples with dissimilar word biases and negative samples with similar word biases. Subsequently, we devise an adapted debiasing contrastive learning mechanism to empower the model to learn robust task-relevant features and alleviate the adverse effect of biased words. Extensive experiments show the superiority of the proposed framework.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2312.10493

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

VK-G2T: Vision and Context Knowledge enhanced Gloss2Text

Jing, Liqiang, Song, Xuemeng, Zu, Xinxing, Zheng, Na, Zhao, Zhongzhou, Nie, Liqiang

arXiv.org Artificial IntelligenceDec-15-2023

Existing sign language translation methods follow a two-stage pipeline: first converting the sign language video to a gloss sequence (i.e. Sign2Gloss) and then translating the generated gloss sequence into a spoken language sentence (i.e. Gloss2Text). While previous studies have focused on boosting the performance of the Sign2Gloss stage, we emphasize the optimization of the Gloss2Text stage. However, this task is non-trivial due to two distinct features of Gloss2Text: (1) isolated gloss input and (2) low-capacity gloss vocabulary. To address these issues, we propose a vision and context knowledge enhanced Gloss2Text model, named VK-G2T, which leverages the visual content of the sign language video to learn the properties of the target sentence and exploit the context knowledge to facilitate the adaptive translation of gloss words. Extensive experiments conducted on a Chinese benchmark validate the superiority of our model.

machine learning, natural language, translation, (18 more...)

arXiv.org Artificial Intelligence

2312.1021

Country: Asia > China (0.28)

Genre: Research Report (0.40)

Industry: Education (0.81)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Knowledge-enhanced Memory Model for Emotional Support Conversation

Jia, Mengzhao, Chen, Qianglong, Jing, Liqiang, Fu, Dawei, Li, Renyu

arXiv.org Artificial IntelligenceOct-11-2023

The prevalence of mental disorders has become a significant issue, leading to the increased focus on Emotional Support Conversation as an effective supplement for mental health support. Existing methods have achieved compelling results, however, they still face three challenges: 1) variability of emotions, 2) practicality of the response, and 3) intricate strategy modeling. To address these challenges, we propose a novel knowledge-enhanced Memory mODEl for emotional suppoRt coNversation (MODERN). Specifically, we first devise a knowledge-enriched dialogue context encoding to perceive the dynamic emotion change of different periods of the conversation for coherent user state modeling and select context-related concepts from ConceptNet for practical response generation. Thereafter, we implement a novel memory-enhanced strategy modeling module to model the semantic patterns behind the strategy categories. Extensive experiments on a widely used large-scale dataset verify the superiority of our model over cutting-edge baselines.

computational linguistic, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2310.077

Country:

Europe > Czechia (0.14)
Europe > Belgium (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Cognitive Science (0.68)

Add feedback

General Debiasing for Multimodal Sentiment Analysis

Sun, Teng, Ni, Juntong, Wang, Wenjie, Jing, Liqiang, Wei, Yinwei, Nie, Liqiang

arXiv.org Artificial IntelligenceAug-7-2023

Existing work on Multimodal Sentiment Analysis (MSA) utilizes multimodal information for prediction yet unavoidably suffers from fitting the spurious correlations between multimodal features and sentiment labels. For example, if most videos with a blue background have positive labels in a dataset, the model will rely on such correlations for prediction, while "blue background" is not a sentiment-related feature. To address this problem, we define a general debiasing MSA task, which aims to enhance the Out-Of-Distribution (OOD) generalization ability of MSA models by reducing their reliance on spurious correlations. To this end, we propose a general debiasing framework based on Inverse Probability Weighting (IPW), which adaptively assigns small weights to the samples with larger bias (i.e., the severer spurious correlations). The key to this debiasing framework is to estimate the bias of each sample, which is achieved by two steps: 1) disentangling the robust features and biased features in each modality, and 2) utilizing the biased features to estimate the bias. Finally, we employ IPW to reduce the effects of large-biased samples, facilitating robust feature learning for sentiment prediction. To examine the model's generalization ability, we keep the original testing sets on two benchmarks and additionally construct multiple unimodal and multimodal OOD testing sets. The empirical results demonstrate the superior generalization ability of our proposed framework. We have released the code and data to facilitate the reproduction https://github.com/Teng-Sun/GEAR.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2307.10511

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.63)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.63)

Add feedback