AITopics | Xu, Yida

Collaborating Authors

Xu, Yida

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Unlocking Multimodal Integration in EHRs: A Prompt Learning Framework for Language and Time Series Fusion

Niu, Shuai, Ma, Jing, Lin, Hongzhan, Bai, Liang, Wang, Zhihua, Bi, Wei, Xu, Yida, Li, Guo, Yang, Xian

arXiv.org Artificial IntelligenceFeb-19-2025

Large language models (LLMs) have shown remarkable performance in vision-language tasks, but their application in the medical field remains underexplored, particularly for integrating structured time series data with unstructured clinical notes. In clinical practice, dynamic time series data such as lab test results capture critical temporal patterns, while clinical notes provide rich semantic context. Merging these modalities is challenging due to the inherent differences between continuous signals and discrete text. To bridge this gap, we introduce ProMedTS, a novel self-supervised multimodal framework that employs prompt-guided learning to unify these heterogeneous data types. Our approach leverages lightweight anomaly detection to generate anomaly captions that serve as prompts, guiding the encoding of raw time series data into informative embeddings. These embeddings are aligned with textual representations in a shared latent space, preserving fine-grained temporal nuances alongside semantic insights. Furthermore, our framework incorporates tailored self-supervised objectives to enhance both intra- and inter-modal alignment. We evaluate ProMedTS on disease diagnosis tasks using real-world datasets, and the results demonstrate that our method consistently outperforms state-of-the-art approaches.

data mining, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2502.13509

Country: Asia > China (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Health Care Technology > Medical Record (0.69)
Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Multimodal Clinical Reasoning through Knowledge-augmented Rationale Generation

Niu, Shuai, Ma, Jing, Bai, Liang, Wang, Zhihua, Xu, Yida, Song, Yunya, Yang, Xian

arXiv.org Artificial IntelligenceNov-12-2024

Clinical rationales play a pivotal role in accurate disease diagnosis; however, many models predominantly use discriminative methods and overlook the importance of generating supportive rationales. Rationale distillation is a process that transfers knowledge from large language models (LLMs) to smaller language models (SLMs), thereby enhancing the latter's ability to break down complex tasks. Despite its benefits, rationale distillation alone is inadequate for addressing domain knowledge limitations in tasks requiring specialized expertise, such as disease diagnosis. Effectively embedding domain knowledge in SLMs poses a significant challenge. While current LLMs are primarily geared toward processing textual data, multimodal LLMs that incorporate time series data, especially electronic health records (EHRs), are still evolving. To tackle these limitations, we introduce ClinRaGen, an SLM optimized for multimodal rationale generation in disease diagnosis. ClinRaGen incorporates a unique knowledge-augmented attention mechanism to merge domain knowledge with time series EHR data, utilizing a stepwise rationale distillation strategy to produce both textual and time series-based clinical rationales. Our evaluations show that ClinRaGen markedly improves the SLM's capability to interpret multimodal EHR data and generate accurate clinical rationales, supporting more reliable disease diagnosis, advancing LLM applications in healthcare, and narrowing the performance divide between LLMs and SLMs.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2411.07611

Country: Asia > China (0.14)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Diagnostic Medicine (0.82)
Health & Medicine > Health Care Technology > Medical Record (0.54)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Efficient Stein Variational Inference for Reliable Distribution-lossless Network Pruning

Wang, Yingchun, Guo, Song, Guo, Jingcai, Zhang, Weizhan, Xu, Yida, Zhang, Jie, Liu, Yi

arXiv.org Artificial IntelligenceDec-7-2022

Network pruning is a promising way to generate light but accurate models and enable their deployment on resource-limited edge devices. However, the current state-of-the-art assumes that the effective sub-network and the other superfluous parameters in the given network share the same distribution, where pruning inevitably involves a distribution truncation operation. They usually eliminate values near zero. While simple, it may not be the most appropriate method, as effective models may naturally have many small values associated with them. Removing near-zero values already embedded in model space may significantly reduce model accuracy. Another line of work has proposed to assign discrete prior over all possible sub-structures that still rely on human-crafted prior hypotheses. Worse still, existing methods use regularized point estimates, namely Hard Pruning, that can not provide error estimations and fail reliability justification for the pruned networks. In this paper, we propose a novel distribution-lossless pruning method, named DLLP, to theoretically find the pruned lottery within Bayesian treatment. Specifically, DLLP remodels the vanilla networks as discrete priors for the latent pruned model and the other redundancy. More importantly, DLLP uses Stein Variational Inference to approach the latent prior and effectively bypasses calculating KL divergence with unknown distribution. Extensive experiments based on small Cifar-10 and large-scaled ImageNet demonstrate that our method can obtain sparser networks with great generalization performance while providing quantified reliability for the pruned model.

artificial intelligence, machine learning, pruning, (18 more...)

arXiv.org Artificial Intelligence

2212.03537

Country:

Europe (0.46)
Asia > China (0.30)
North America > United States (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Realistic Image Generation using Region-phrase Attention

Huang, Wanming, Xu, Yida, Oppermann, Ian

arXiv.org Machine LearningFeb-4-2019

The Generative Adversarial Network (GAN) has recently been applied to generate synthetic images from text. Despite significant advances, most current state-of-the-art algorithms are regular-grid region based; when attention is used, it is mainly applied between individual regular-grid regions and a word. These approaches are sufficient to generate images that contain a single object in its foreground, such as a "bird" or "flower". However, natural languages often involve complex foreground objects and the background may also constitute a variable portion of the generated image. Therefore, the regular-grid based image attention weights may not necessarily concentrate on the intended foreground region(s), which in turn, results in an unnatural looking image. Additionally, individual words such as "a", "blue" and "shirt" do not necessarily provide a full visual context unless they are applied together. For this reason, in our paper, we proposed a novel method in which we introduced an additional set of attentions between true-grid regions and word phrases. The true-grid region is derived using a set of auxiliary bounding boxes. These auxiliary bounding boxes serve as superior location indicators to where the alignment and attention should be drawn with the word phrases. Word phrases are derived from analysing Part-of-Speech (POS) results. We perform experiments on this novel network architecture using the Microsoft Common Objects in Context (MSCOCO) dataset and the model generates $256 \times 256$ conditioned on a short sentence description. Our proposed approach is capable of generating more realistic images compared with the current state-of-the-art algorithms.

deep learning, image generation, neural network, (20 more...)

arXiv.org Machine Learning

1902.05395

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Introducing AI to Undergraduate Students via Computer Vision Projects

Zeng, Kaiman (Arkansas Tech University) | Li, Yancheng (Arkansas Tech University) | Xu, Yida (Arkansas Tech University) | Wu, Di (Arkansas Tech University) | Wu, Nansong (Arkansas Tech University)

AAAI ConferencesFeb-8-2018

Computer vision, as a subfield in the general artificial intelligence (AI), is a technology can be visualized and easily found in a large number of state-of-art applications. In this project, undergraduate students performed research on a landmark recognition task using computer vision techniques. The project focused on analyzing, designing, configuring, and testing the two core components in landmark recognition: feature detection and description. The project modeled the landmark recognition system as a tour guide for visitors to the campus and evaluated the performance in the real world circumstances. By analyzing real-world data and solving problems, student's cognitive skills and critical thinking skills were sharpened. Their knowledge and understanding in mathematical modeling and data processing were also enhanced.

artificial intelligence, educational setting, student, (16 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country: North America > United States > Arkansas (0.18)

Industry: Education > Educational Setting > Higher Education (1.00)

Technology: Information Technology > Artificial Intelligence > Vision (0.97)

Add feedback