AITopics | Pan, Weishen

Collaborating Authors

Pan, Weishen

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Augmented Risk Prediction for the Onset of Alzheimer's Disease from Electronic Health Records with Large Language Models

Wang, Jiankun, Ahn, Sumyeong, Dalal, Taykhoom, Zhang, Xiaodan, Pan, Weishen, Zhang, Qiannan, Chen, Bin, Dodge, Hiroko H., Wang, Fei, Zhou, Jiayu

arXiv.org Artificial IntelligenceMay-25-2024

Alzheimer's disease (AD) is the fifth-leading cause of death among Americans aged 65 and older. Screening and early detection of AD and related dementias (ADRD) are critical for timely intervention and for identifying clinical trial participants. The widespread adoption of electronic health records (EHRs) offers an important resource for developing ADRD screening tools such as machine learning based predictive models. Recent advancements in large language models (LLMs) demonstrate their unprecedented capability of encoding knowledge and performing reasoning, which offers them strong potential for enhancing risk prediction. This paper proposes a novel pipeline that augments risk prediction by leveraging the few-shot inference power of LLMs to make predictions on cases where traditional supervised learning methods (SLs) may not excel. Specifically, we develop a collaborative pipeline that combines SLs and LLMs via a confidence-driven decision-making mechanism, leveraging the strengths of SLs in clear-cut cases and LLMs in more complex scenarios. We evaluate this pipeline using a real-world EHR data warehouse from Oregon Health \& Science University (OHSU) Hospital, encompassing EHRs from over 2.5 million patients and more than 20 million patient encounters. Our results show that our proposed approach effectively combines the power of SLs and LLMs, offering significant improvements in predictive performance. This advancement holds promise for revolutionizing ADRD screening and early detection practices, with potential implications for better strategies of patient management and thus improving healthcare.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2405.16413

Country: North America > United States > Oregon (0.24)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (1.00)
Health & Medicine > Health Care Technology > Medical Record (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Joint Analysis of Single-Cell Data across Cohorts with Missing Modalities

Arriola, Marianne, Pan, Weishen, Zhou, Manqi, Zhang, Qiannan, Su, Chang, Wang, Fei

arXiv.org Artificial IntelligenceMay-18-2024

Joint analysis of multi-omic single-cell data across cohorts has significantly enhanced the comprehensive analysis of cellular processes. However, most of the existing approaches for this purpose require access to samples with complete modality availability, which is impractical in many real-world scenarios. In this paper, we propose (Single-Cell Cross-Cohort Cross-Category) integration, a novel framework that learns unified cell representations under domain shift without requiring full-modality reference samples. Our generative approach learns rich cross-modal and cross-domain relationships that enable imputation of these missing modalities. Through experiments on real-world multi-omic datasets, we demonstrate that offers a robust solution to single-cell tasks such as cell type clustering, cell type classification, and feature imputation.

artificial intelligence, machine learning, modality, (16 more...)

arXiv.org Artificial Intelligence

2405.1128

Country: North America > United States > New York (0.15)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.70)
Information Technology > Networks (0.62)
Health & Medicine > Therapeutic Area > Immunology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Local Discovery by Partitioning: Polynomial-Time Causal Discovery Around Exposure-Outcome Pairs

Maasch, Jacqueline, Pan, Weishen, Gupta, Shantanu, Kuleshov, Volodymyr, Gan, Kyra, Wang, Fei

arXiv.org Machine LearningOct-25-2023

This work addresses the problem of automated covariate selection under limited prior knowledge. Given an exposure-outcome pair {X,Y} and a variable set Z of unknown causal structure, the Local Discovery by Partitioning (LDP) algorithm partitions Z into subsets defined by their relation to {X,Y}. We enumerate eight exhaustive and mutually exclusive partitions of any arbitrary Z and leverage this taxonomy to differentiate confounders from other variable types. LDP is motivated by valid adjustment set identification, but avoids the pretreatment assumption commonly made by automated covariate selection methods. We provide theoretical guarantees that LDP returns a valid adjustment set for any Z that meets sufficient graphical conditions. Under stronger conditions, we prove that partition labels are asymptotically correct. Total independence tests is worst-case quadratic in |Z|, with sub-quadratic runtimes observed empirically. We numerically validate our theoretical guarantees on synthetic and semi-synthetic graphs. Adjustment sets from LDP yield less biased and more precise average treatment effect estimates than baselines, with LDP outperforming on confounder recall, test count, and runtime for valid adjustment set discovery.

adjustment, artificial intelligence, machine learning, (19 more...)

arXiv.org Machine Learning

2310.17816

Country: North America > United States > New York > New York County > New York City (0.14)

Genre:

Research Report > Experimental Study (0.68)
Research Report > New Finding (0.68)

Industry: Health & Medicine > Epidemiology (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)
Information Technology > Data Science (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)

Add feedback

Bipartite Ranking Fairness through a Model Agnostic Ordering Adjustment

Cui, Sen, Pan, Weishen, Zhang, Changshui, Wang, Fei

arXiv.org Artificial IntelligenceJul-27-2023

Algorithmic fairness has been a serious concern and received lots of interest in machine learning community. In this paper, we focus on the bipartite ranking scenario, where the instances come from either the positive or negative class and the goal is to learn a ranking function that ranks positive instances higher than negative ones. While there could be a trade-off between fairness and performance, we propose a model agnostic post-processing framework xOrder for achieving fairness in bipartite ranking and maintaining the algorithm classification performance. In particular, we optimize a weighted sum of the utility as identifying an optimal warping path across different protected groups and solve it through a dynamic programming process. xOrder is compatible with various classification models and ranking fairness metrics, including supervised and unsupervised fairness metrics. In addition to binary groups, xOrder can be applied to multiple protected groups. We evaluate our proposed algorithm on four benchmark data sets and two real-world patient electronic health record repositories. xOrder consistently achieves a better balance between the algorithm utility and ranking fairness on a variety of datasets with different metrics. From the visualization of the calibrated ranking scores, xOrder mitigates the score distribution shifts of different groups compared with baselines. Moreover, additional analytical results verify that xOrder achieves a robust performance when faced with fewer samples and a bigger difference between training and testing ranking score distributions.

artificial intelligence, latexit sha1, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2307.14668

Country: North America > United States (0.46)

Genre:

Research Report > Experimental Study (0.45)
Research Report > New Finding (0.45)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Health Care Technology > Medical Record (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

InfoDiffusion: Representation Learning Using Information Maximizing Diffusion Models

Wang, Yingheng, Schiff, Yair, Gokaslan, Aaron, Pan, Weishen, Wang, Fei, De Sa, Christopher, Kuleshov, Volodymyr

arXiv.org Artificial IntelligenceJun-14-2023

While diffusion models excel at generating high-quality samples, their latent variables typically lack semantic meaning and are not suitable for representation learning. Here, we propose InfoDiffusion, an algorithm that augments diffusion models with low-dimensional latent variables that capture high-level factors of variation in the data. InfoDiffusion relies on a learning objective regularized with the mutual information between observed and hidden variables, which improves latent space quality and prevents the latents from being ignored by expressive diffusion-based decoders. Empirically, we find that InfoDiffusion learns disentangled and human-interpretable latent representations that are competitive with state-of-the-art generative and contrastive methods, while retaining the high sample quality of diffusion models. Our method enables manipulating the attributes of generated images and has the potential to assist tasks that require exploring a learned latent space to generate quality samples, e.g., generative design.

artificial intelligence, infodiffusion, machine learning, (11 more...)

arXiv.org Artificial Intelligence

2306.08757

Country:

North America > United States > New York (0.14)
North America > United States > Hawaii (0.14)
Asia > Middle East > Israel (0.14)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Patchwork Learning: A Paradigm Towards Integrative Analysis across Diverse Biomedical Data Sources

Rajendran, Suraj, Pan, Weishen, Sabuncu, Mert R., Chen, Yong, Zhou, Jiayu, Wang, Fei

arXiv.org Artificial IntelligenceMay-13-2023

Machine learning (ML) in healthcare presents numerous opportunities for enhancing patient care, population health, and healthcare providers' workflows. However, the real-world clinical and cost benefits remain limited due to challenges in data privacy, heterogeneous data sources, and the inability to fully leverage multiple data modalities. In this perspective paper, we introduce "patchwork learning" (PL), a novel paradigm that addresses these limitations by integrating information from disparate datasets composed of different data modalities (e.g., clinical free-text, medical images, omics) and distributed across separate and secure sites. PL allows the simultaneous utilization of complementary data sources while preserving data privacy, enabling the development of more holistic and generalizable ML models. We present the concept of patchwork learning and its current implementations in healthcare, exploring the potential opportunities and applicable data sources for addressing various healthcare challenges. PL leverages bridging modalities or overlapping feature spaces across sites to facilitate information sharing and impute missing data, thereby addressing related prediction tasks. We discuss the challenges associated with PL, many of which are shared by federated and multimodal learning, and provide recommendations for future research in this field. By offering a more comprehensive approach to healthcare data integration, patchwork learning has the potential to revolutionize the clinical applicability of ML models. This paradigm promises to strike a balance between personalization and generalizability, ultimately enhancing patient experiences, improving population health, and optimizing healthcare providers' workflows. Introduction Machine learning (ML) in healthcare is a rapidly evolving field, presenting numerous opportunities for progress. Active and passive patient data collection, both during and outside medical care, can be utilized to address health challenges. As a result, ML has become an essential tool for processing and analyzing these data in various domains, including natural language processing, computer vision, and more. ML systems have demonstrated their potential to enhance patient experiences, improve population health, reduce per capita healthcare costs, and optimize healthcare providers' workflows Data privacy is a major challenge facing the use of ML in healthcare, as it restricts the potential for pooling electronic health record (EHR) data from multiple sites. While single modality models exist (e.g., clinical notes, lab tests, omics, or medical images), systems that simultaneously leverage multiple modalities are relatively scarce. MML combines disparate data sources to capitalize on complementary information, thereby improving performance.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2305.06217

Country:

North America > United States > New York (0.28)
North America > United States > Michigan (0.28)
North America > United States > Pennsylvania (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > Promising Solution (0.92)
Research Report > Strength High (0.67)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Health Care Technology > Medical Record (1.00)
(3 more...)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

xOrder: A Model Agnostic Post-Processing Framework for Achieving Ranking Fairness While Maintaining Algorithm Utility

Cui, Sen, Pan, Weishen, Zhang, Changshui, Wang, Fei

arXiv.org Machine LearningOct-9-2020

Algorithmic fairness has received lots of interests in machine learning recently. In this paper, we focus on the bipartite ranking scenario, where the instances come from either the positive or negative class and the goal is to learn a ranking function that ranks positive instances higher than negative ones. In an unfair setting, the probabilities of ranking the positives higher than negatives are different across different protected groups. We propose a general post-processing framework, xOrder, for achieving fairness in bipartite ranking while maintaining the algorithm classification performance. In particular, we optimize a weighted sum of the utility and fairness by directly adjusting the relative ordering across groups. We formulate this problem as identifying an optimal warping path across {different} protected groups and solve it through a dynamic programming process. xOrder is compatible with various classification models and applicable to a variety of ranking fairness metrics. We evaluate our proposed algorithm on four benchmark data sets and two real world patient electronic health record repository. The experimental results show that our approach can achieve great balance between the algorithm utility and ranking fairness. Our algorithm can also achieve robust performance when training and testing ranking score distributions are significantly different.

fairness, health & medicine, optimization problem, (19 more...)

arXiv.org Machine Learning

2006.08267

Country:

Asia > China (0.14)
North America > United States (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Health Care Technology > Medical Record (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback