AITopics | Cai, Xiaoyan

Collaborating Authors

Cai, Xiaoyan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Instruction-Aligned Visual Attention for Mitigating Hallucinations in Large Vision-Language Models

Li, Bin, Gao, Dehong, Wang, Yeyuan, Jin, Linbo, Yu, Shanqing, Cai, Xiaoyan, Yang, Libin

arXiv.org Artificial IntelligenceMar-24-2025

Despite the significant success of Large Vision-Language models(LVLMs), these models still suffer hallucinations when describing images, generating answers that include non-existent objects. It is reported that these models tend to over-focus on certain irrelevant image tokens that do not contain critical information for answering the question and distort the output. To address this, we propose an Instruction-Aligned Visual Attention(IAVA) approach, which identifies irrelevant tokens by comparing changes in attention weights under two different instructions. By applying contrastive decoding, we dynamically adjust the logits generated from original image tokens and irrelevant image tokens, reducing the model's over-attention to irrelevant information. The experimental results demonstrate that IAVA consistently outperforms existing decoding techniques on benchmarks such as MME, POPE, and TextVQA in mitigating object hallucinations. Our IAVA approach is available online at https://github.com/Lee-lab558/IAVA.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2503.18556

Country: Asia > China > Zhejiang Province (0.29)

Genre: Research Report > New Finding (0.66)

Industry: Health & Medicine (0.94)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

MoDULA: Mixture of Domain-Specific and Universal LoRA for Multi-Task Learning

Ma, Yufei, Liang, Zihan, Dai, Huangyu, Chen, Ben, Gao, Dehong, Ran, Zhuoran, Zihan, Wang, Jin, Linbo, Jiang, Wen, Zhang, Guannan, Cai, Xiaoyan, Yang, Libin

arXiv.org Artificial IntelligenceDec-10-2024

The growing demand for larger-scale models in the development of \textbf{L}arge \textbf{L}anguage \textbf{M}odels (LLMs) poses challenges for efficient training within limited computational resources. Traditional fine-tuning methods often exhibit instability in multi-task learning and rely heavily on extensive training resources. Here, we propose MoDULA (\textbf{M}ixture \textbf{o}f \textbf{D}omain-Specific and \textbf{U}niversal \textbf{L}oR\textbf{A}), a novel \textbf{P}arameter \textbf{E}fficient \textbf{F}ine-\textbf{T}uning (PEFT) \textbf{M}ixture-\textbf{o}f-\textbf{E}xpert (MoE) paradigm for improved fine-tuning and parameter efficiency in multi-task learning. The paradigm effectively improves the multi-task capability of the model by training universal experts, domain-specific experts, and routers separately. MoDULA-Res is a new method within the MoDULA paradigm, which maintains the model's general capability by connecting universal and task-specific experts through residual connections. The experimental results demonstrate that the overall performance of the MoDULA-Flan and MoDULA-Res methods surpasses that of existing fine-tuning methods on various LLMs. Notably, MoDULA-Res achieves more significant performance improvements in multiple tasks while reducing training costs by over 80\% without losing general capability. Moreover, MoDULA displays flexible pluggability, allowing for the efficient addition of new tasks without retraining existing experts from scratch. This progressive training paradigm circumvents data balancing issues, enhancing training efficiency and model stability. Overall, MoDULA provides a scalable, cost-effective solution for fine-tuning LLMs with enhanced parameter efficiency and generalization capability.

large language model, machine learning, preprint arxiv, (16 more...)

arXiv.org Artificial Intelligence

2412.07405

Country:

Asia > China (0.47)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback

General2Specialized LLMs Translation for E-commerce

Chen, Kaidi, Chen, Ben, Gao, Dehong, Dai, Huangyu, Jiang, Wen, Ning, Wei, Yu, Shanqing, Yang, Libin, Cai, Xiaoyan

arXiv.org Artificial IntelligenceApr-6-2024

Existing Neural Machine Translation (NMT) models mainly handle translation in the general domain, while overlooking domains with special writing formulas, such as e-commerce and legal documents. Taking e-commerce as an example, the texts usually include amounts of domain-related words and have more grammar problems, which leads to inferior performances of current NMT methods. To address these problems, we collect two domain-related resources, including a set of term pairs (aligned Chinese-English bilingual terms) and a parallel corpus annotated for the e-commerce domain. Furthermore, we propose a two-step fine-tuning paradigm (named G2ST) with self-contrastive semantic enhancement to transfer one general NMT model to the specialized NMT model for e-commerce. The paradigm can be used for the NMT models based on Large language models (LLMs). Extensive evaluations on real e-commerce titles demonstrate the superior translation quality and robustness of our G2ST approach, as compared with state-of-the-art NMT models such as LLaMA, Qwen, GPT-3.5, and even GPT-4.

large language model, machine learning, translation, (15 more...)

arXiv.org Artificial Intelligence

2403.03689

Country: Asia > China > Zhejiang Province (0.14)

Genre: Research Report (0.50)

Industry: Information Technology > Services > e-Commerce Services (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)

Add feedback

Holistic Evaluation of GPT-4V for Biomedical Imaging

Liu, Zhengliang, Jiang, Hanqi, Zhong, Tianyang, Wu, Zihao, Ma, Chong, Li, Yiwei, Yu, Xiaowei, Zhang, Yutong, Pan, Yi, Shu, Peng, Lyu, Yanjun, Zhang, Lu, Yao, Junjie, Dong, Peixin, Cao, Chao, Xiao, Zhenxiang, Wang, Jiaqi, Zhao, Huan, Xu, Shaochen, Wei, Yaonai, Chen, Jingyuan, Dai, Haixing, Wang, Peilong, He, Hao, Wang, Zewei, Wang, Xinyu, Zhang, Xu, Zhao, Lin, Liu, Yiheng, Zhang, Kai, Yan, Liheng, Sun, Lichao, Liu, Jun, Qiang, Ning, Ge, Bao, Cai, Xiaoyan, Zhao, Shijie, Hu, Xintao, Yuan, Yixuan, Li, Gang, Zhang, Shu, Zhang, Xin, Jiang, Xi, Zhang, Tuo, Shen, Dinggang, Li, Quanzheng, Liu, Wei, Li, Xiang, Zhu, Dajiang, Liu, Tianming

arXiv.org Artificial IntelligenceNov-10-2023

In this paper, we present a large-scale evaluation probing GPT-4V's capabilities and limitations for biomedical image analysis. GPT-4V represents a breakthrough in artificial general intelligence (AGI) for computer vision, with applications in the biomedical domain. We assess GPT-4V's performance across 16 medical imaging categories, including radiology, oncology, ophthalmology, pathology, and more. Tasks include modality recognition, anatomy localization, disease diagnosis, report generation, and lesion detection. The extensive experiments provide insights into GPT-4V's strengths and weaknesses. Results show GPT-4V's proficiency in modality and anatomy recognition but difficulty with disease diagnosis and localization. GPT-4V excels at diagnostic report generation, indicating strong image captioning skills. While promising for biomedical imaging AI, GPT-4V requires further enhancement and validation before clinical deployment. We emphasize responsible development and testing for trustworthy integration of biomedical AGI. This rigorous evaluation of GPT-4V on diverse medical images advances understanding of multimodal large language models (LLMs) and guides future work toward impactful healthcare applications.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2312.05256

Country:

Asia > China (1.00)
North America > United States > Texas (0.13)
North America > United States > North Carolina (0.13)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

ChatRadio-Valuer: A Chat Large Language Model for Generalizable Radiology Report Generation Based on Multi-institution and Multi-system Data

Zhong, Tianyang, Zhao, Wei, Zhang, Yutong, Pan, Yi, Dong, Peixin, Jiang, Zuowei, Kui, Xiaoyan, Shang, Youlan, Yang, Li, Wei, Yaonai, Yang, Longtao, Chen, Hao, Zhao, Huan, Liu, Yuxiao, Zhu, Ning, Li, Yiwei, Wang, Yisong, Yao, Jiaqi, Wang, Jiaqi, Zeng, Ying, He, Lei, Zheng, Chao, Zhang, Zhixue, Li, Ming, Liu, Zhengliang, Dai, Haixing, Wu, Zihao, Zhang, Lu, Zhang, Shu, Cai, Xiaoyan, Hu, Xintao, Zhao, Shijie, Jiang, Xi, Zhang, Xin, Li, Xiang, Zhu, Dajiang, Guo, Lei, Shen, Dinggang, Han, Junwei, Liu, Tianming, Liu, Jun, Zhang, Tuo

arXiv.org Artificial IntelligenceOct-9-2023

Radiology report generation, as a key step in medical image analysis, is critical to the quantitative analysis of clinically informed decision-making levels. However, complex and diverse radiology reports with cross-source heterogeneity pose a huge generalizability challenge to the current methods under massive data volume, mainly because the style and normativity of radiology reports are obviously distinctive among institutions, body regions inspected and radiologists. Recently, the advent of large language models (LLM) offers great potential for recognizing signs of health conditions. To resolve the above problem, we collaborate with the Second Xiangya Hospital in China and propose ChatRadio-Valuer based on the LLM, a tailored model for automatic radiology report generation that learns generalizable representations and provides a basis pattern for model adaptation in sophisticated analysts' cases. Specifically, ChatRadio-Valuer is trained based on the radiology reports from a single institution by means of supervised fine-tuning, and then adapted to disease diagnosis tasks for human multi-system evaluation (i.e., chest, abdomen, muscle-skeleton, head, and maxillofacial $\&$ neck) from six different institutions in clinical-level events. The clinical dataset utilized in this study encompasses a remarkable total of \textbf{332,673} observations. From the comprehensive results on engineering indicators, clinical efficacy and deployment cost metrics, it can be shown that ChatRadio-Valuer consistently outperforms state-of-the-art models, especially ChatGPT (GPT-3.5-Turbo) and GPT-4 et al., in terms of the diseases diagnosis from radiology reports. ChatRadio-Valuer provides an effective avenue to boost model generalization performance and alleviate the annotation workload of experts to enable the promotion of clinical AI applications in radiology reports.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2310.05242

Country:

Asia > China > Hunan Province (0.14)
North America > United States > Texas (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Nuclear Medicine (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Evaluating Large Language Models for Radiology Natural Language Processing

Liu, Zhengliang, Zhong, Tianyang, Li, Yiwei, Zhang, Yutong, Pan, Yi, Zhao, Zihao, Dong, Peixin, Cao, Chao, Liu, Yuxiao, Shu, Peng, Wei, Yaonai, Wu, Zihao, Ma, Chong, Wang, Jiaqi, Wang, Sheng, Zhou, Mengyue, Jiang, Zuowei, Li, Chunlin, Holmes, Jason, Xu, Shaochen, Zhang, Lu, Dai, Haixing, Zhang, Kai, Zhao, Lin, Chen, Yuanhao, Liu, Xu, Wang, Peilong, Yan, Pingkun, Liu, Jun, Ge, Bao, Sun, Lichao, Zhu, Dajiang, Li, Xiang, Liu, Wei, Cai, Xiaoyan, Hu, Xintao, Jiang, Xi, Zhang, Shu, Zhang, Xin, Zhang, Tuo, Zhao, Shijie, Li, Quanzheng, Zhu, Hongtu, Shen, Dinggang, Liu, Tianming

arXiv.org Artificial IntelligenceJul-27-2023

The rise of large language models (LLMs) has marked a pivotal shift in the field of natural language processing (NLP). LLMs have revolutionized a multitude of domains, and they have made a significant impact in the medical field. Large language models are now more abundant than ever, and many of these models exhibit bilingual capabilities, proficient in both English and Chinese. However, a comprehensive evaluation of these models remains to be conducted. This lack of assessment is especially apparent within the context of radiology NLP. This study seeks to bridge this gap by critically evaluating thirty two LLMs in interpreting radiology reports, a crucial component of radiology NLP. Specifically, the ability to derive impressions from radiologic findings is assessed. The outcomes of this evaluation provide key insights into the performance, strengths, and weaknesses of these LLMs, informing their practical applications within the medical domain.

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2307.13693

Country:

Asia > China (1.00)
North America > United States > Texas (0.14)
North America > United States > North Carolina (0.14)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Nuclear Medicine (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

ImpressionGPT: An Iterative Optimizing Framework for Radiology Report Summarization with ChatGPT

Ma, Chong, Wu, Zihao, Wang, Jiaqi, Xu, Shaochen, Wei, Yaonai, Liu, Zhengliang, Jiang, Xi, Guo, Lei, Cai, Xiaoyan, Zhang, Shu, Zhang, Tuo, Zhu, Dajiang, Shen, Dinggang, Liu, Tianming, Li, Xiang

arXiv.org Artificial IntelligenceMay-3-2023

The 'Impression' section of a radiology report is a critical basis for communication between radiologists and other physicians, and it is typically written by radiologists based on the 'Findings' section. However, writing numerous impressions can be laborious and error-prone for radiologists. Although recent studies have achieved promising results in automatic impression generation using large-scale medical text data for pre-training and fine-tuning pre-trained language models, such models often require substantial amounts of medical text data and have poor generalization performance. While large language models (LLMs) like ChatGPT have shown strong generalization capabilities and performance, their performance in specific domains, such as radiology, remains under-investigated and potentially limited. To address this limitation, we propose ImpressionGPT, which leverages the in-context learning capability of LLMs by constructing dynamic contexts using domain-specific, individualized data. This dynamic prompt approach enables the model to learn contextual knowledge from semantically similar examples from existing data. Additionally, we design an iterative optimization algorithm that performs automatic evaluation on the generated impression results and composes the corresponding instruction prompts to further optimize the model. The proposed ImpressionGPT model achieves state-of-the-art performance on both MIMIC-CXR and OpenI datasets without requiring additional training data or fine-tuning the LLMs. This work presents a paradigm for localizing LLMs that can be applied in a wide range of similar application scenarios, bridging the gap between general-purpose LLMs and the specific language processing needs of various domains.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2304.08448

Country:

North America > United States (0.68)
Asia > China (0.46)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Nuclear Medicine (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Generative Adversarial Network Based Heterogeneous Bibliographic Network Representation for Personalized Citation Recommendation

Cai, Xiaoyan (School of Automation, Northwestern Polytechnical University) | Han, Junwei (School of Automation, Northwestern Polytechnical University) | Yang, Libin (School of Automation, Northwestern Polytechnical University)

AAAI ConferencesFeb-8-2018

Network representation has been recently exploited for many applications, such as citation recommendation, multi-label classification and link prediction. It learns low-dimensional vector representation for each vertex in networks. Existing network representation methods only focus on incomplete aspects of vertex information (i.e., vertex content, network structure or partial integration), moreover they are commonly designed for homogeneous information networks where all the vertices of a network are of the same type. In this paper, we propose a deep network representation model that integrates network structure and the vertex content information into a unified framework by exploiting generative adversarial network, and represents different types of vertices in the heterogeneous network in a continuous and common vector space. Based on the proposed model, we can obtain heterogeneous bibliographic network representation for efficient citation recommendation. The proposed model also makes personalized citation recommendation possible, which is a new issue that a few papers addressed in the past. When evaluated on the AAN and DBLP datasets, the performance of the proposed heterogeneous bibliographic network based citation recommendation approach is comparable with that of the other network representation based citation recommendation approaches. The results also demonstrate that the personalized citation recommendation approach is more effective than the non-personalized citation recommendation approach.

artificial intelligence, neural network, representation, (20 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Industry:

Health & Medicine (0.48)
Information Technology (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.34)

Add feedback