AITopics | Wu, Xingjiao

Collaborating Authors

Wu, Xingjiao

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answering

Huai, Tianyu, Zhou, Jie, Wu, Xingjiao, Chen, Qin, Bai, Qingchun, Zhou, Ze, He, Liang

arXiv.org Artificial IntelligenceMar-1-2025

Multimodal large language models (MLLMs) have garnered widespread attention from researchers due to their remarkable understanding and generation capabilities in visual language tasks (e.g., visual question answering). However, the rapid pace of knowledge updates in the real world makes offline training of MLLMs costly, and when faced with non-stationary data streams, MLLMs suffer from catastrophic forgetting during learning. In this paper, we propose an MLLMs-based dual momentum Mixture-of-Experts (CL-MoE) framework for continual visual question answering (VQA). We integrate MLLMs with continual learning to utilize the rich commonsense knowledge in LLMs. We introduce a Dual-Router MoE (RMoE) strategy to select the global and local experts using task-level and instance-level routers, to robustly assign weights to the experts most appropriate for the task. Then, we design a dynamic Momentum MoE (MMoE) to update the parameters of experts dynamically based on the relationships between the experts and tasks/instances, so that the model can absorb new knowledge while maintaining existing knowledge. The extensive experimental results indicate that our method achieves state-of-the-art performance on 10 VQA tasks, proving the effectiveness of our approach.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2503.00413

Country: Asia > China (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Moyun: A Diffusion-Based Model for Style-Specific Chinese Calligraphy Generation

Liu, Kaiyuan, Mei, Jiahao, Zhang, Hengyu, Zhang, Yihuai, Wu, Xingjiao, Dong, Daoguo, He, Liang

arXiv.org Artificial IntelligenceOct-10-2024

Although Chinese calligraphy generation has achieved style transfer, generating calligraphy by specifying the calligrapher, font, and character style remains challenging. To address this, we propose a new Chinese calligraphy generation model 'Moyun' , which replaces the Unet in the Diffusion model with Vision Mamba and introduces the TripleLabel control mechanism to achieve controllable calligraphy generation. The model was tested on our large-scale dataset 'Mobao' of over 1.9 million images, and the results demonstrate that 'Moyun' can effectively control the generation process and produce calligraphy in the specified style. Even for calligraphy the calligrapher has not written, 'Moyun' can generate calligraphy that matches the style of the calligrapher.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2410.07618

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

MindScope: Exploring cognitive biases in large language models through Multi-Agent Systems

Xie, Zhentao, Zhao, Jiabao, Wang, Yilei, Shi, Jinxin, Bai, Yanhong, Wu, Xingjiao, He, Liang

arXiv.org Artificial IntelligenceOct-6-2024

Detecting cognitive biases in large language models (LLMs) is a fascinating task that aims to probe the existing cognitive biases within these models. Current methods for detecting cognitive biases in language models generally suffer from incomplete detection capabilities and a restricted range of detectable bias types. To address this issue, we introduced the 'MindScope' dataset, which distinctively integrates static and dynamic elements. The static component comprises 5,170 open-ended questions spanning 72 cognitive bias categories. The dynamic component leverages a rule-based, multi-agent communication framework to facilitate the generation of multi-round dialogues. This framework is flexible and readily adaptable for various psychological experiments involving LLMs. In addition, we introduce a multi-agent detection method applicable to a wide range of detection tasks, which integrates Retrieval-Augmented Generation (RAG), competitive debate, and a reinforcement learning-based decision module. Demonstrating substantial effectiveness, this method has shown to improve detection accuracy by as much as 35.10% compared to GPT-4. Codes and appendix are available at https://github.com/2279072142/MindScope.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2410.04452

Country: Asia > China (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine (0.93)
Banking & Finance > Trading (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

FairMonitor: A Dual-framework for Detecting Stereotypes and Biases in Large Language Models

Bai, Yanhong, Zhao, Jiabao, Shi, Jinxin, Xie, Zhentao, Wu, Xingjiao, He, Liang

arXiv.org Artificial IntelligenceMay-5-2024

Detecting stereotypes and biases in Large Language Models (LLMs) is crucial for enhancing fairness and reducing adverse impacts on individuals or groups when these models are applied. Traditional methods, which rely on embedding spaces or are based on probability metrics, fall short in revealing the nuanced and implicit biases present in various contexts. To address this challenge, we propose the FairMonitor framework and adopt a static-dynamic detection method for a comprehensive evaluation of stereotypes and biases in LLMs. The static component consists of a direct inquiry test, an implicit association test, and an unknown situation test, including 10,262 open-ended questions with 9 sensitive factors and 26 educational scenarios. And it is effective for evaluating both explicit and implicit biases. Moreover, we utilize the multi-agent system to construst the dynamic scenarios for detecting subtle biases in more complex and realistic setting. This component detects the biases based on the interaction behaviors of LLMs across 600 varied educational scenarios. The experimental results show that the cooperation of static and dynamic methods can detect more stereotypes and biased in LLMs.

artificial intelligence, large language model, natural language, (14 more...)

arXiv.org Artificial Intelligence

2405.03098

Country:

North America > Canada (0.14)
Asia > China (0.14)
North America > United States (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

DCQA: Document-Level Chart Question Answering towards Complex Reasoning and Common-Sense Understanding

Wu, Anran, Xiao, Luwei, Wu, Xingjiao, Yang, Shuwen, Xu, Junjie, Zhuang, Zisong, Xie, Nian, Jin, Cheng, He, Liang

arXiv.org Artificial IntelligenceOct-29-2023

Visually-situated languages such as charts and plots are omnipresent in real-world documents. These graphical depictions are human-readable and are often analyzed in visually-rich documents to address a variety of questions that necessitate complex reasoning and common-sense responses. Despite the growing number of datasets that aim to answer questions over charts, most only address this task in isolation, without considering the broader context of document-level question answering. Moreover, such datasets lack adequate common-sense reasoning information in their questions. In this work, we introduce a novel task named document-level chart question answering (DCQA). The goal of this task is to conduct document-level question answering, extracting charts or plots in the document via document layout analysis (DLA) first and subsequently performing chart question answering (CQA). The newly developed benchmark dataset comprises 50,010 synthetic documents integrating charts in a wide range of styles (6 styles in contrast to 3 for PlotQA and ChartQA) and includes 699,051 questions that demand a high degree of reasoning ability and common-sense understanding. Besides, we present the development of a potent question-answer generation engine that employs table data, a rich color set, and basic question templates to produce a vast array of reasoning question-answer pairs automatically. Based on DCQA, we devise an OCR-free transformer for document-level chart-oriented understanding, capable of DLA and answering complex reasoning and common-sense questions over charts in an OCR-free manner. Our DCQA dataset is expected to foster research on understanding visualizations in documents, especially for scenarios that require complex reasoning for charts in the visually-rich document. We implement and evaluate a set of baselines, and our proposed method achieves comparable results.

information, machine learning, question answering, (22 more...)

arXiv.org Artificial Intelligence

2310.18983

Country:

Asia > China (0.14)
Asia > Middle East > Israel (0.14)
Europe > Switzerland (0.14)

Genre:

Workflow (0.68)
Research Report (0.64)

Industry: Banking & Finance (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

FairMonitor: A Four-Stage Automatic Framework for Detecting Stereotypes and Biases in Large Language Models

Bai, Yanhong, Zhao, Jiabao, Shi, Jinxin, Wei, Tingjiang, Wu, Xingjiao, He, Liang

arXiv.org Artificial IntelligenceOct-26-2023

Detecting stereotypes and biases in Large Language Models (LLMs) can enhance fairness and reduce adverse impacts on individuals or groups when these LLMs are applied. However, the majority of existing methods focus on measuring the model's preference towards sentences containing biases and stereotypes within datasets, which lacks interpretability and cannot detect implicit biases and stereotypes in the real world. To address this gap, this paper introduces a four-stage framework to directly evaluate stereotypes and biases in the generated content of LLMs, including direct inquiry testing, serial or adapted story testing, implicit association testing, and unknown situation testing. Additionally, the paper proposes multi-dimensional evaluation metrics and explainable zero-shot prompts for automated evaluation. Using the education sector as a case study, we constructed the Edu-FairMonitor based on the four-stage framework, which encompasses 12,632 open-ended questions covering nine sensitive factors and 26 educational scenarios. Experimental results reveal varying degrees of stereotypes and biases in five LLMs evaluated on Edu-FairMonitor. Moreover, the results of our proposed automated evaluation method have shown a high correlation with human annotations.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2308.10397

Country:

Asia > China (0.14)
North America > United States (0.14)

Genre: Research Report (0.64)

Industry: Education > Curriculum > Subject-Specific Education (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback

Progressive Evidence Refinement for Open-domain Multimodal Retrieval Question Answering

Yang, Shuwen, Wu, Anran, Wu, Xingjiao, Xiao, Luwei, Ma, Tianlong, Jin, Cheng, He, Liang

arXiv.org Artificial IntelligenceOct-14-2023

Pre-trained multimodal models have achieved significant success in retrieval-based question answering. However, current multimodal retrieval question-answering models face two main challenges. Firstly, utilizing compressed evidence features as input to the model results in the loss of fine-grained information within the evidence. Secondly, a gap exists between the feature extraction of evidence and the question, which hinders the model from effectively extracting critical features from the evidence based on the given question. We propose a two-stage framework for evidence retrieval and question-answering to alleviate these issues. First and foremost, we propose a progressive evidence refinement strategy for selecting crucial evidence. This strategy employs an iterative evidence retrieval approach to uncover the logical sequence among the evidence pieces. It incorporates two rounds of filtering to optimize the solution space, thus further ensuring temporal efficiency. Subsequently, we introduce a semi-supervised contrastive learning training strategy based on negative samples to expand the scope of the question domain, allowing for a more thorough exploration of latent knowledge within known samples. Finally, in order to mitigate the loss of fine-grained information, we devise a multi-turn retrieval and question-answering strategy to handle multimodal inputs. This strategy involves incorporating multimodal evidence directly into the model as part of the historical dialogue and question. Meanwhile, we leverage a cross-modal attention mechanism to capture the underlying connections between the evidence and the question, and the answer is generated through a decoding generation approach. We validate the model's effectiveness through extensive experiments, achieving outstanding performance on WebQA and MultimodelQA benchmark tests.

machine learning, natural language, question answering, (19 more...)

arXiv.org Artificial Intelligence

2310.09696

Country: Asia > China (0.14)

Genre:

Research Report (0.64)
Workflow (0.46)

Industry:

Media > Music (0.93)
Leisure & Entertainment (0.93)

Technology: Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)

Add feedback

DDT: Dual-branch Deformable Transformer for Image Denoising

Liu, Kangliang, Du, Xiangcheng, Liu, Sijie, Zheng, Yingbin, Wu, Xingjiao, Jin, Cheng

arXiv.org Artificial IntelligenceApr-13-2023

Transformer is beneficial for image denoising tasks since it can model long-range dependencies to overcome the limitations presented by inductive convolutional biases. However, directly applying the transformer structure to remove noise is challenging because its complexity grows quadratically with the spatial resolution. In this paper, we propose an efficient Dual-branch Deformable Transformer (DDT) denoising network which captures both local and global interactions in parallel. We divide features with a fixed patch size and a fixed number of patches in local and global branches, respectively. In addition, we apply deformable attention operation in both branches, which helps the network focus on more important regions and further reduces computational complexity. We conduct extensive experiments on real-world and synthetic denoising tasks, and the proposed DDT achieves state-of-the-art performance with significantly fewer computational costs.

artificial intelligence, computational cost, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2304.06346

Genre: Research Report (0.64)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback