Chen, Yanxi
Enhancing Alzheimer's Diagnosis: Leveraging Anatomical Landmarks in Graph Convolutional Neural Networks on Tetrahedral Meshes
Chen, Yanxi, Farazi, Mohammad, Yang, Zhangsihao, Fan, Yonghui, Ashton, Nicholas, Reiman, Eric M, Su, Yi, Wang, Yalin
Alzheimer's disease (AD) is a major neurodegenerative condition that affects millions around the world. As one of the main biomarkers in the AD diagnosis procedure, brain amyloid positivity is typically identified by positron emission tomography (PET), which is costly and invasive. Brain structural magnetic resonance imaging (sMRI) may provide a safer and more convenient solution for the AD diagnosis. Recent advances in geometric deep learning have facilitated sMRI analysis and early diagnosis of AD. However, determining AD pathology, such as brain amyloid deposition, in preclinical stage remains challenging, as less significant morphological changes can be observed. As a result, few AD classification models are generalizable to the brain amyloid positivity classification task. Blood-based biomarkers (BBBMs), on the other hand, have recently achieved remarkable success in predicting brain amyloid positivity and identifying individuals with high risk of being brain amyloid positive. However, individuals in medium risk group still require gold standard tests such as Amyloid PET for further evaluation. Inspired by the recent success of transformer architectures, we propose a geometric deep learning model based on transformer that is both scalable and robust to variations in input volumetric mesh size. Our work introduced a novel tokenization scheme for tetrahedral meshes, incorporating anatomical landmarks generated by a pre-trained Gaussian process model. Our model achieved superior classification performance in AD classification task. In addition, we showed that the model was also generalizable to the brain amyloid positivity prediction with individuals in the medium risk class, where BM alone cannot achieve a clear classification. Our work may enrich geometric deep learning research and improve AD diagnosis accuracy without using expensive and invasive PET scans.
RetinalGPT: A Retinal Clinical Preference Conversational Assistant Powered by Large Vision-Language Models
Zhu, Wenhui, Li, Xin, Chen, Xiwen, Qiu, Peijie, Vasa, Vamsi Krishna, Dong, Xuanzhao, Chen, Yanxi, Lepore, Natasha, Dumitrascu, Oana, Su, Yi, Wang, Yalin
Recently, Multimodal Large Language Models (MLLMs) have gained significant attention for their remarkable ability to process and analyze non-textual data, such as images, videos, and audio. Notably, several adaptations of general-domain MLLMs to the medical field have been explored, including LLaVA-Med. However, these medical adaptations remain insufficiently advanced in understanding and interpreting retinal images. In contrast, medical experts emphasize the importance of quantitative analyses for disease detection and interpretation. This underscores a gap between general-domain and medical-domain MLLMs: while general-domain MLLMs excel in broad applications, they lack the specialized knowledge necessary for precise diagnostic and interpretative tasks in the medical field. To address these challenges, we introduce \textit{RetinalGPT}, a multimodal conversational assistant for clinically preferred quantitative analysis of retinal images. Specifically, we achieve this by compiling a large retinal image dataset, developing a novel data pipeline, and employing customized visual instruction tuning to enhance both retinal analysis and enrich medical knowledge. In particular, RetinalGPT outperforms MLLM in the generic domain by a large margin in the diagnosis of retinal diseases in 8 benchmark retinal datasets. Beyond disease diagnosis, RetinalGPT features quantitative analyses and lesion localization, representing a pioneering step in leveraging LLMs for an interpretable and end-to-end clinical research framework. The code is available at https://github.com/Retinal-Research/RetinalGPT
Plasma-CycleGAN: Plasma Biomarker-Guided MRI to PET Cross-modality Translation Using Conditional CycleGAN
Chen, Yanxi, Su, Yi, Dumitrascu, Celine, Chen, Kewei, Weidman, David, Caselli, Richard J, Ashton, Nicholas, Reiman, Eric M, Wang, Yalin
Cross-modality translation between MRI and PET imaging is challenging due to the distinct mechanisms underlying these modalities. Blood-based biomarkers (BBBMs) are revolutionizing Alzheimer's disease (AD) detection by identifying patients and quantifying brain amyloid levels. However, the potential of BBBMs to enhance PET image synthesis remains unexplored. In this paper, we performed a thorough study on the effect of incorporating BBBM into deep generative models. By evaluating three widely used cross-modality translation models, we found that BBBMs integration consistently enhances the generative quality across all models. By visual inspection of the generated results, we observed that PET images generated by CycleGAN exhibit the best visual fidelity. Based on these findings, we propose Plasma-CycleGAN, a novel generative model based on CycleGAN, to synthesize PET images from MRI using BBBMs as conditions. This is the first approach to integrate BBBMs in conditional cross-modality translation between MRI and PET.
A Simple and Provable Scaling Law for the Test-Time Compute of Large Language Models
Chen, Yanxi, Pan, Xuchen, Li, Yaliang, Ding, Bolin, Zhou, Jingren
We propose a general two-stage algorithm that enjoys a provable scaling law for the test-time compute of large language models (LLMs). Given an input problem, the proposed algorithm first generates $N$ candidate solutions, and then chooses the best one via a multiple-round knockout tournament where each pair of candidates are compared for $K$ times and only the winners move on to the next round. In a minimalistic implementation, both stages can be executed with a black-box LLM alone and nothing else (e.g., no external verifier or reward model), and a total of $N \times (K + 1)$ highly parallelizable LLM calls are needed for solving an input problem. Assuming that a generated candidate solution is correct with probability $p_{\text{gen}} > 0$ and a comparison between a pair of correct and incorrect solutions identifies the right winner with probability $p_{\text{comp}} > 0.5$ (i.e., better than a random guess), we prove theoretically that the failure probability of the proposed algorithm decays to zero exponentially with respect to $N$ and $K$: $$\mathbb{P}(\text{final output is incorrect}) \le (1 - p_{\text{gen}})^N + \lceil \log_2 N \rceil e^{-2 K (p_{\text{comp}} - 0.5)^2}.$$ Our empirical results with the challenging MMLU-Pro benchmark validate the technical assumptions, as well as the efficacy of the proposed algorithm and the gains from scaling up its test-time compute.
On the Design and Analysis of LLM-Based Algorithms
Chen, Yanxi, Li, Yaliang, Ding, Bolin, Zhou, Jingren
We initiate a formal investigation into the design and analysis of LLM-based algorithms, i.e. algorithms that contain one or multiple calls of large language models (LLMs) as sub-routines and critically rely on the capabilities of LLMs. While LLM-based algorithms, ranging from basic LLM calls with prompt engineering to complicated LLM-powered agent systems and compound AI systems, have achieved remarkable empirical success, the design and optimization of them have mostly relied on heuristics and trial-and-errors, which is largely due to a lack of formal and analytical study for these algorithms. To fill this gap, we start by identifying the computational-graph representation of LLM-based algorithms, the design principle of task decomposition, and some key abstractions, which then facilitate our formal analysis for the accuracy and efficiency of LLM-based algorithms, despite the black-box nature of LLMs. We further consider parallel decomposition for a case study, providing extensive analytical and empirical study for four concrete examples of this pattern. Our proposed framework holds promise for advancing LLM-based algorithms, by revealing the reasons behind curious empirical phenomena, guiding the choices of hyperparameters, predicting the empirical performance of algorithms, and inspiring new algorithm design. To promote further study of LLM-based algorithms, we release our source code at https://github.com/modelscope/agentscope/tree/main/examples/paper_llm_based_algorithm.
Clustering Mixtures of Discrete Distributions: A Note on Mitra's Algorithm
Seif, Mohamed, Chen, Yanxi
Clustering is a critical challenge in network science, pivotal for detecting underlying patterns and structures in unlabeled data. To explore the boundaries of this challenge, stochastic block models (SBMs) have been effectively utilized as a mathematical framework to assess the performance of clustering algorithms. Specifically, an SBM is a statistical model developed to reveal the structural dynamics of networks or graphs, where nodes represent individual entities and edges symbolize the connections between them. In a typical SBM, nodes are categorized into blocks or communities according to their connectivity patterns, with the probability of an edge existing between any two nodes depending on the blocks to which they belong [3]. For example, in a social network using an SBM, nodes might be organized by attributes such as age, gender, or geographic location, with friendship probabilities determined by their block memberships [1, 6]. The Bipartite Stochastic Block Model(B-SBM)[2] extends the conventional SBM to accommodate networks comprising two distinct node types, forming a bipartite graph structure. This adaptation is particularly beneficial in contexts such as recommendation systems, where nodes represent users and products, or in particular social networks, where nodes might denote individuals and the groups or events they participate in. In B-SBMs, the connections between nodes from different sets are governed by an "affinity matrix" that specifies the likelihood of linkage based on group affiliations. This matrix is integral to capturing interaction patterns within the network, allowing for a sophisticated estimation of model parameters from observed connections.
Boosting Single Positive Multi-label Classification with Generalized Robust Loss
Chen, Yanxi, Li, Chunxiao, Dai, Xinyang, Li, Jinhuan, Sun, Weiyu, Wang, Yiming, Zhang, Renyuan, Zhang, Tinghe, Wang, Bo
Multi-label learning (MLL) requires comprehensive multi-semantic annotations that is hard to fully obtain, thus often resulting in missing labels scenarios. In this paper, we investigate Single Positive Multi-label Learning (SPML), where each image is associated with merely one positive label. Existing SPML methods only focus on designing losses using mechanisms such as hard pseudo-labeling and robust losses, mostly leading to unacceptable false negatives. To address this issue, we first propose a generalized loss framework based on expected risk minimization to provide soft pseudo labels, and point out that the former losses can be seamlessly converted into our framework. In particular, we design a novel robust loss based on our framework, which enjoys flexible coordination between false positives and false negatives, and can additionally deal with the imbalance between positive and negative samples. Extensive experiments show that our approach can significantly improve SPML performance and outperform the vast majority of state-of-the-art methods on all the four benchmarks.
EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism
Chen, Yanxi, Pan, Xuchen, Li, Yaliang, Ding, Bolin, Zhou, Jingren
While recent works have shown preliminary evidence for the efficacy of early exiting in accelerating LLM inference, EE-LLM makes a foundational step towards scaling up early-exit LLMs by supporting their training and inference with massive 3D parallelism. Built upon Megatron-LM, EE-LLM implements a variety of algorithmic innovations and performance optimizations tailored to early exiting, including a lightweight method that facilitates backpropagation for the early-exit training objective with pipeline parallelism, techniques of leveraging idle resources in the original pipeline schedule for computation related to early-exit layers, and two approaches of early-exit inference that are compatible with KV caching for autoregressive generation. Our analytical and empirical study shows that EE-LLM achieves great training efficiency with negligible computational overhead compared to standard LLM training, as well as outstanding inference speedup without compromising output quality. To facilitate further research and adoption, we release EE-LLM at https://github.com/pan-x-c/EE-LLM.
EE-Tuning: An Economical yet Scalable Solution for Tuning Early-Exit Large Language Models
Pan, Xuchen, Chen, Yanxi, Li, Yaliang, Ding, Bolin, Zhou, Jingren
This work introduces EE-Tuning, a lightweight and economical solution to training/tuning early-exit large language models (LLMs). In contrast to the common approach of full-parameter pre-training, EE-Tuning augments any pre-trained (and possibly fine-tuned) standard LLM with additional early-exit layers that are tuned in a parameter-efficient manner, which requires significantly less computational resources and training data. Our implementation of EE-Tuning achieves outstanding training efficiency via extensive performance optimizations, as well as scalability due to its full compatibility with 3D parallelism. Results of systematic experiments validate the efficacy of EE-Tuning, confirming that effective early-exit LLM inference can be achieved with a limited training budget. In hope of making early-exit LLMs accessible to the community, we release the source code of our implementation of EE-Tuning at https://github.com/pan-x-c/EE-LLM.
Ranking from Pairwise Comparisons in General Graphs and Graphs with Locality
Chen, Yanxi
This technical report studies the problem of ranking from pairwise comparisons in the classical Bradley-Terry-Luce (BTL) model, with a focus on score estimation. For general graphs, we show that, with sufficiently many samples, maximum likelihood estimation (MLE) achieves an entrywise estimation error matching the Cram\'er-Rao lower bound, which can be stated in terms of effective resistances; the key to our analysis is a connection between statistical estimation and iterative optimization by preconditioned gradient descent. We are also particularly interested in graphs with locality, where only nearby items can be connected by edges; our analysis identifies conditions under which locality does not hurt, i.e. comparing the scores between a pair of items that are far apart in the graph is nearly as easy as comparing a pair of nearby items. We further explore divide-and-conquer algorithms that can provably achieve similar guarantees even in the regime with the sparsest samples, while enjoying certain computational advantages. Numerical results validate our theory and confirm the efficacy of the proposed algorithms.