Goto

Collaborating Authors

 Thompson, Paul


Distributed Harmonization: Federated Clustered Batch Effect Adjustment and Generalization

arXiv.org Artificial Intelligence

Independent and identically distributed (i.i.d.) data is essential to many data analysis and modeling techniques. In the medical domain, collecting data from multiple sites or institutions is a common strategy that guarantees sufficient clinical diversity, determined by the decentralized nature of medical data. However, data from various sites are easily biased by the local environment or facilities, thereby violating the i.i.d. rule. A common strategy is to harmonize the site bias while retaining important biological information. The ComBat is among the most popular harmonization approaches and has recently been extended to handle distributed sites. However, when faced with situations involving newly joined sites in training or evaluating data from unknown/unseen sites, ComBat lacks compatibility and requires retraining with data from all the sites. The retraining leads to significant computational and logistic overhead that is usually prohibitive. In this work, we develop a novel Cluster ComBat harmonization algorithm, which leverages cluster patterns of the data in different sites and greatly advances the usability of ComBat harmonization. We use extensive simulation and real medical imaging data from ADNI to demonstrate the superiority of the proposed approach. Our codes are provided in https://github.com/illidanlab/distributed-cluster-harmonization.


Interpretable Spatio-Temporal Embedding for Brain Structural-Effective Network with Ordinary Differential Equation

arXiv.org Artificial Intelligence

The MRI-derived brain network serves as a pivotal instrument in elucidating both the structural and functional aspects of the brain, encompassing the ramifications of diseases and developmental processes. However, prevailing methodologies, often focusing on synchronous BOLD signals from functional MRI (fMRI), may not capture directional influences among brain regions and rarely tackle temporal functional dynamics. In this study, we first construct the brain-effective network via the dynamic causal model. Subsequently, we introduce an interpretable graph learning framework termed Spatio-Temporal Embedding ODE (STE-ODE). This framework incorporates specifically designed directed node embedding layers, aiming at capturing the dynamic interplay between structural and effective networks via an ordinary differential equation (ODE) model, which characterizes spatial-temporal brain dynamics. Our framework is validated on several clinical phenotype prediction tasks using two independent publicly available datasets (HCP and OASIS). The experimental results clearly demonstrate the advantages of our model compared to several state-of-the-art methods.


ConspEmoLLM: Conspiracy Theory Detection Using an Emotion-Based Large Language Model

arXiv.org Artificial Intelligence

The internet has brought both benefits and harms to society. A prime example of the latter is misinformation, including conspiracy theories, which flood the web. Recent advances in natural language processing, particularly the emergence of large language models (LLMs), have improved the prospects of accurate misinformation detection. However, most LLM-based approaches to conspiracy theory detection focus only on binary classification and fail to account for the important relationship between misinformation and affective features (i.e., sentiment and emotions). Driven by a comprehensive analysis of conspiracy text that reveals its distinctive affective features, we propose ConspEmoLLM, the first open-source LLM that integrates affective information and is able to perform diverse tasks relating to conspiracy theories. These tasks include not only conspiracy theory detection, but also classification of theory type and detection of related discussion (e.g., opinions towards theories). ConspEmoLLM is fine-tuned based on an emotion-oriented LLM using our novel ConDID dataset, which includes five tasks to support LLM instruction tuning and evaluation. We demonstrate that when applied to these tasks, ConspEmoLLM largely outperforms several open-source general domain LLMs and ChatGPT, as well as an LLM that has been fine-tuned using ConDID, but which does not use affective features. This project will be released on https://github.com/lzw108/ConspEmoLLM/.


Emotion Detection for Misinformation: A Review

arXiv.org Artificial Intelligence

With the advent of social media, an increasing number of netizens are sharing and reading posts and news online. However, the huge volumes of misinformation (e.g., fake news and rumors) that flood the internet can adversely affect people's lives, and have resulted in the emergence of rumor and fake news detection as a hot research topic. The emotions and sentiments of netizens, as expressed in social media posts and news, constitute important factors that can help to distinguish fake news from genuine news and to understand the spread of rumors. This article comprehensively reviews emotion-based methods for misinformation detection. We begin by explaining the strong links between emotions and misinformation. We subsequently provide a detailed analysis of a range of misinformation detection methods that employ a variety of emotion, sentiment and stance-based features, and describe their strengths and weaknesses. Finally, we discuss a number of ongoing challenges in emotion-based misinformation detection based on large language models and suggest future research directions, including data collection (multi-platform, multilingual), annotation, benchmark, multimodality, and interpretability.


Large-Scale Unsupervised Deep Representation Learning for Brain Structure

arXiv.org Machine Learning

Machine Learning (ML) is increasingly being used for computer aided diagnosis of brain related disorders based on structural magnetic resonance imaging (MRI) data. Most of such work employs biologically and medically meaningful hand-crafted features calculated from different regions of the brain. The construction of such highly specialized features requires a considerable amount of time, manual oversight and careful quality control to ensure the absence of errors in the computational process. Recent advances in Deep Representation Learning have shown great promise in extracting highly non-linear and information-rich features from data. In this paper, we present a novel large-scale deep unsupervised approach to learn generic feature representations of structural brain MRI scans, which requires no specialized domain knowledge or manual intervention. Our method produces low-dimensional representations of brain structure, which can be used to reconstruct brain images with very low error and exhibit performance comparable to FreeSurfer features on various classification tasks.


Identifying Genetic Risk Factors via Sparse Group Lasso with Group Graph Structure

arXiv.org Machine Learning

Genome-wide association studies (GWA studies or GWAS) investigate the relationships between genetic variants such as single-nucleotide polymorphisms (SNPs) and individual traits. Recently, incorporating biological priors together with machine learning methods in GWA studies has attracted increasing attention. However, in real-world, nucleotide-level bio-priors have not been well-studied to date. Alternatively, studies at gene-level, for example, protein--protein interactions and pathways, are more rigorous and legitimate, and it is potentially beneficial to utilize such gene-level priors in GWAS. In this paper, we proposed a novel two-level structured sparse model, called Sparse Group Lasso with Group-level Graph structure (SGLGG), for GWAS. It can be considered as a sparse group Lasso along with a group-level graph Lasso. Essentially, SGLGG penalizes the nucleotide-level sparsity as well as takes advantages of gene-level priors (both gene groups and networks), to identifying phenotype-associated risk SNPs. We employ the alternating direction method of multipliers algorithm to optimize the proposed model. Our experiments on the Alzheimer's Disease Neuroimaging Initiative whole genome sequence data and neuroimage data demonstrate the effectiveness of SGLGG. As a regression model, it is competitive to the state-of-the-arts sparse models; as a variable selection method, SGLGG is promising for identifying Alzheimer's disease-related risk SNPs.


Collaborative Biomedical Information Retrieval

AAAI Conferences

In the context of two related NIH projects supporting scientific collaboration we seek to implement an environment for collaborative information retrieval and analysis based on utility theory.