AITopics | Ye, Wenqian

Collaborating Authors

Ye, Wenqian

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Recent Advances, Applications and Open Challenges in Machine Learning for Health: Reflections from Research Roundtables at ML4H 2024 Symposium

Adibi, Amin, Cao, Xu, Ji, Zongliang, Kaur, Jivat Neet, Chen, Winston, Healey, Elizabeth, Nuwagira, Brighton, Ye, Wenqian, Woollard, Geoffrey, Xu, Maxwell A, Cui, Hejie, Xi, Johnny, Chang, Trenton, Bikia, Vasiliki, Zhang, Nicole, Noori, Ayush, Xia, Yuan, Hossain, Md. Belal, Frank, Hanna A., Peluso, Alina, Pu, Yuan, Shen, Shannon Zejiang, Wu, John, Fallahpour, Adibvafa, Mahbub, Sazan, Duncan, Ross, Zhang, Yuwei, Cao, Yurui, Xu, Zuheng, Craig, Michael, Krishnan, Rahul G., Beheshti, Rahmatollah, Rehg, James M., Karim, Mohammad Ehsanul, Coffee, Megan, Celi, Leo Anthony, Fries, Jason Alan, Sadatsafavi, Mohsen, Shung, Dennis, McWeeney, Shannon, Dafflon, Jessica, Jabbour, Sarah

arXiv.org Artificial IntelligenceFeb-10-2025

The fourth Machine Learning for Health (ML4H) symposium was held in person on December 15th and 16th, 2024, in the traditional, ancestral, and unceded territories of the Musqueam, Squamish, and Tsleil-Waututh Nations in Vancouver, British Columbia, Canada. The symposium included research roundtable sessions to foster discussions between participants and senior researchers on timely and relevant topics for the ML4H community. The organization of the research roundtables at the conference involved 13 senior and 27 junior chairs across 13 tables. Each roundtable session included an invited senior chair (with substantial experience in the field), junior chairs (responsible for facilitating the discussion), and attendees from diverse backgrounds with an interest in the session's topic.

data mining, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2502.06693

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.24)
North America > United States > New York > New York County (0.14)
Europe > United Kingdom > England > Oxfordshire (0.14)

Genre:

Research Report > Experimental Study (1.00)
Research Report > Promising Solution (0.92)
Research Report > New Finding (0.67)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
(15 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(4 more...)

Add feedback

Medical Video Generation for Disease Progression Simulation

Cao, Xu, Liang, Kaizhao, Liao, Kuei-Da, Gao, Tianren, Ye, Wenqian, Chen, Jintai, Ding, Zhiguang, Cao, Jianguo, Rehg, James M., Sun, Jimeng

arXiv.org Artificial IntelligenceNov-18-2024

Modeling disease progression is crucial for improving the quality and efficacy of clinical diagnosis and prognosis, but it is often hindered by a lack of longitudinal medical image monitoring for individual patients. To address this challenge, we propose the first Medical Video Generation (MVG) framework that enables controlled manipulation of disease-related image and video features, allowing precise, realistic, and personalized simulations of disease progression. Our approach begins by leveraging large language models (LLMs) to recaption prompt for disease trajectory. Next, a controllable multi-round diffusion model simulates the disease progression state for each patient, creating realistic intermediate disease state sequence. Finally, a diffusion-based video transition generation model interpolates disease progression between these states. We validate our framework across three medical imaging domains: chest X-ray, fundus photography, and skin image. Our results demonstrate that MVG significantly outperforms baseline models in generating coherent and clinically plausible disease trajectories. Two user studies by veteran physicians, provide further validation and insights into the clinical utility of the generated sequences. MVG has the potential to assist healthcare providers in modeling disease trajectories, interpolating missing medical image data, and enhancing medical education through realistic, dynamic visualizations of disease progression.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2411.11943

Country: North America > United States (0.67)

Genre: Research Report > New Finding (0.68)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Epidemiology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

On-Board Vision-Language Models for Personalized Autonomous Vehicle Motion Control: System Design and Real-World Validation

Cui, Can, Yang, Zichong, Zhou, Yupeng, Peng, Juntong, Park, Sung-Yeon, Zhang, Cong, Ma, Yunsheng, Cao, Xu, Ye, Wenqian, Feng, Yiheng, Panchal, Jitesh, Li, Lingxi, Chen, Yaobin, Wang, Ziran

arXiv.org Artificial IntelligenceNov-17-2024

Personalized driving refers to an autonomous vehicle's ability to adapt its driving behavior or control strategies to match individual users' preferences and driving styles while maintaining safety and comfort standards. However, existing works either fail to capture every individual preference precisely or become computationally inefficient as the user base expands. Vision-Language Models (VLMs) offer promising solutions to this front through their natural language understanding and scene reasoning capabilities. In this work, we propose a lightweight yet effective on-board VLM framework that provides low-latency personalized driving performance while maintaining strong reasoning capabilities. Our solution incorporates a Retrieval-Augmented Generation (RAG)-based memory module that enables continuous learning of individual driving preferences through human feedback. Through comprehensive real-world vehicle deployment and experiments, our system has demonstrated the ability to provide safe, comfortable, and personalized driving experiences across various scenarios and significantly reduce takeover rates by up to 76.9%. To the best of our knowledge, this work represents the first end-to-end VLM-based motion control system in real-world autonomous vehicles.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2411.11913

Country: North America > United States (0.28)

Genre: Research Report > Promising Solution (0.34)

Industry:

Transportation > Ground > Road (1.00)
Information Technology (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

MpoxVLM: A Vision-Language Model for Diagnosing Skin Lesions from Mpox Virus Infection

Cao, Xu, Ye, Wenqian, Moise, Kenny, Coffee, Megan

arXiv.org Artificial IntelligenceNov-16-2024

In the aftermath of the COVID-19 pandemic and amid accelerating climate change, emerging infectious diseases, particularly those arising from zoonotic spillover, remain a global threat. Mpox (caused by the monkeypox virus) is a notable example of a zoonotic infection that often goes undiagnosed, especially as its rash progresses through stages, complicating detection across diverse populations with different presentations. In August 2024, the WHO Director-General declared the mpox outbreak a public health emergency of international concern for a second time. Despite the deployment of deep learning techniques for detecting diseases from skin lesion images, a robust and publicly accessible foundation model for mpox diagnosis is still lacking due to the unavailability of open-source mpox skin lesion images, multimodal clinical data, and specialized training pipelines. To address this gap, we propose MpoxVLM, a vision-language model (VLM) designed to detect mpox by analyzing both skin lesion images and patient clinical information. MpoxVLM integrates the CLIP visual encoder, an enhanced Vision Transformer (ViT) classifier for skin lesions, and LLaMA-2-7B models, pre-trained and fine-tuned on visual instruction-following question-answer pairs from our newly released mpox skin lesion dataset. Our work achieves 90.38% accuracy for mpox detection, offering a promising pathway to improve early diagnostic accuracy in combating mpox.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2411.10888

Country:

North America > United States (1.00)
Africa (1.00)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.48)

Industry: Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

MM-SpuBench: Towards Better Understanding of Spurious Biases in Multimodal LLMs

Ye, Wenqian, Zheng, Guangtao, Ma, Yunsheng, Cao, Xu, Lai, Bolin, Rehg, James M., Zhang, Aidong

arXiv.org Artificial IntelligenceJun-24-2024

Spurious bias, a tendency to use spurious correlations between non-essential input attributes and target variables for predictions, has revealed a severe robustness pitfall in deep learning models trained on single modality data. Multimodal Large Language Models (MLLMs), which integrate both vision and language models, have demonstrated strong capability in joint vision-language understanding. However, whether spurious biases are prevalent in MLLMs remains under-explored. We mitigate this gap by analyzing the spurious biases in a multimodal setting, uncovering the specific test data patterns that can manifest this problem when biases in the vision model cascade into the alignment between visual and text tokens in MLLMs. To better understand this problem, we introduce MM-SpuBench, a comprehensive visual question-answering (VQA) benchmark designed to evaluate MLLMs' reliance on nine distinct categories of spurious correlations from five open-source image datasets. The VQA dataset is built from human-understandable concept information (attributes). Leveraging this benchmark, we conduct a thorough evaluation of current state-of-the-art MLLMs. Our findings illuminate the persistence of the reliance on spurious correlations from these models and underscore the urge for new methodologies to mitigate spurious biases. To support the MLLM robustness research, we release our VQA benchmark at https://huggingface.co/datasets/mmbench/MM-SpuBench.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2406.17126

Country: North America > United States > Illinois (0.14)

Genre: Research Report > New Finding (0.66)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Spurious Correlations in Machine Learning: A Survey

Ye, Wenqian, Zheng, Guangtao, Cao, Xu, Ma, Yunsheng, Zhang, Aidong

arXiv.org Artificial IntelligenceMay-16-2024

Machine learning systems are known to be sensitive In recent years, spurious correlations have been studied under to spurious correlations between nonessential various names, such as shortcuts, dataset biases, group features of the inputs (e.g., background, robustness, simplicity bias, and so on. We have seen significant texture, and secondary objects) and the corresponding progress in analyzing and mitigating spurious correlations labels. These features and their correlations in various areas such as computer vision (Wang et al., with the labels are known as "spurious" 2021), natural language processing (Du et al., 2022b), and because they tend to change with shifts in realworld healthcare (Huang et al., 2022). Despite the progress, there data distributions, which can negatively impact lacks a survey in this area that formally defines spurious correlations the model's generalization and robustness.

artificial intelligence, deep learning, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2402.12715

Country: North America > United States (0.28)

Genre:

Research Report (1.00)
Overview (0.93)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Learning Robust Classifiers with Self-Guided Spurious Correlation Mitigation

Zheng, Guangtao, Ye, Wenqian, Zhang, Aidong

arXiv.org Artificial IntelligenceMay-6-2024

Deep neural classifiers tend to rely on spurious correlations between spurious attributes of inputs and targets to make predictions, which could jeopardize their generalization capability. Training classifiers robust to spurious correlations typically relies on annotations of spurious correlations in data, which are often expensive to get. In this paper, we tackle an annotation-free setting and propose a self-guided spurious correlation mitigation framework. Our framework automatically constructs fine-grained training labels tailored for a classifier obtained with empirical risk minimization to improve its robustness against spurious correlations. The fine-grained training labels are formulated with different prediction behaviors of the classifier identified in a novel spuriousness embedding space. We construct the space with automatically detected conceptual attributes and a novel spuriousness metric which measures how likely a class-attribute correlation is exploited for predictions. We demonstrate that training the classifier to distinguish different prediction behaviors reduces its reliance on spurious correlations without knowing them a priori and outperforms prior methods on five real-world datasets.

artificial intelligence, correlation, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2405.03649

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

LaMPilot: An Open Benchmark Dataset for Autonomous Driving with Language Model Programs

Ma, Yunsheng, Cui, Can, Cao, Xu, Ye, Wenqian, Liu, Peiran, Lu, Juanwu, Abdelraouf, Amr, Gupta, Rohit, Han, Kyungtae, Bera, Aniket, Rehg, James M., Wang, Ziran

arXiv.org Artificial IntelligenceDec-7-2023

We present LaMPilot, a novel framework for planning in the field of autonomous driving, rethinking the task as a code-generation process that leverages established behavioral primitives. This approach aims to address the challenge of interpreting and executing spontaneous user instructions such as "overtake the car ahead," which have typically posed difficulties for existing frameworks. We introduce the LaMPilot benchmark specifically designed to quantitatively evaluate the efficacy of Large Language Models (LLMs) in translating human directives into actionable driving policies. We then evaluate a wide range of state-of-the-art code generation language models on tasks from the LaMPilot Benchmark. The results of the experiments showed that GPT-4, with human feedback, achieved an impressive task completion rate of 92.7% and a minimal collision rate of 0.9%. To encourage further investigation in this area, our code and dataset will be made available.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2312.04372

Country: North America > United States > Illinois (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Transportation > Ground > Road (1.00)
Automobiles & Trucks (1.00)
Information Technology > Robotics & Automation (0.86)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

A Survey on Multimodal Large Language Models for Autonomous Driving

Cui, Can, Ma, Yunsheng, Cao, Xu, Ye, Wenqian, Zhou, Yang, Liang, Kaizhao, Chen, Jintai, Lu, Juanwu, Yang, Zichong, Liao, Kuei-Da, Gao, Tianren, Li, Erlong, Tang, Kun, Cao, Zhipeng, Zhou, Tong, Liu, Ao, Yan, Xinrui, Mei, Shuqi, Cao, Jianguo, Wang, Ziran, Zheng, Chao

arXiv.org Artificial IntelligenceNov-20-2023

With the emergence of Large Language Models (LLMs) and Vision Foundation Models (VFMs), multimodal AI systems benefiting from large models have the potential to equally perceive the real world, make decisions, and control tools as humans. In recent months, LLMs have shown widespread attention in autonomous driving and map systems. Despite its immense potential, there is still a lack of a comprehensive understanding of key challenges, opportunities, and future endeavors to apply in LLM driving systems. In this paper, we present a systematic investigation in this field. We first introduce the background of Multimodal Large Language Models (MLLMs), the multimodal models development using LLMs, and the history of autonomous driving. Then, we overview existing MLLM tools for driving, transportation, and map systems together with existing datasets and benchmarks. Moreover, we summarized the works in The 1st WACV Workshop on Large Language and Vision Models for Autonomous Driving (LLVM-AD), which is the first workshop of its kind regarding LLMs in autonomous driving. To further promote the development of this field, we also discuss several important problems regarding using MLLMs in autonomous driving systems that need to be solved by both academia and industry.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2311.1232

Country:

Asia > China (0.67)
North America > United States > Virginia > Albemarle County > Charlottesville (0.14)
North America > United States > Indiana > Tippecanoe County (0.14)
(2 more...)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.46)

Industry:

Transportation > Ground > Road (1.00)
Information Technology > Robotics & Automation (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

MACP: Efficient Model Adaptation for Cooperative Perception

Ma, Yunsheng, Lu, Juanwu, Cui, Can, Zhao, Sicheng, Cao, Xu, Ye, Wenqian, Wang, Ziran

arXiv.org Artificial IntelligenceNov-7-2023

Vehicle-to-vehicle (V2V) communications have greatly enhanced the perception capabilities of connected and automated vehicles (CAVs) by enabling information sharing to "see through the occlusions", resulting in significant performance improvements. However, developing and training complex multi-agent perception models from scratch can be expensive and unnecessary when existing single-agent models show remarkable generalization capabilities. In this paper, we propose a new framework termed MACP, which equips a single-agent pre-trained model with cooperation capabilities. We approach this objective by identifying the key challenges of shifting from single-agent to cooperative settings, adapting the model by freezing most of its parameters and adding a few lightweight modules. We demonstrate in our experiments that the proposed framework can effectively utilize cooperative observations and outperform other state-of-the-art approaches in both simulated and real-world cooperative perception benchmarks while requiring substantially fewer tunable parameters with reduced communication costs. Our source code is available at https://github.com/PurdueDigitalTwin/MACP.

cooperative perception, efficient model adaptation, macp

arXiv.org Artificial Intelligence

2310.1687

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence (0.53)

Add feedback