Goto

Collaborating Authors

 Liu, Jingyu


PGAD: Prototype-Guided Adaptive Distillation for Multi-Modal Learning in AD Diagnosis

arXiv.org Artificial Intelligence

Missing modalities pose a major issue in Alzheimer's Disease (AD) diagnosis, as many subjects lack full imaging data due to cost and clinical constraints. While multi-modal learning leverages complementary information, most existing methods train only on complete data, ignoring the large proportion of incomplete samples in real-world datasets like ADNI. This reduces the effective training set and limits the full use of valuable medical data. While some methods incorporate incomplete samples, they fail to effectively address inter-modal feature alignment and knowledge transfer challenges under high missing rates. To address this, we propose a Prototype-Guided Adaptive Distillation (PGAD) framework that directly incorporates incomplete multi-modal data into training. PGAD enhances missing modality representations through prototype matching and balances learning with a dynamic sampling strategy. We validate PGAD on the ADNI dataset with varying missing rates (20%, 50%, and 70%) and demonstrate that it significantly outperforms state-of-the-art approaches. Ablation studies confirm the effectiveness of prototype matching and adaptive sampling, highlighting the potential of our framework for robust and scalable AD diagnosis in real-world clinical settings.


Optimizing Multi-Hop Document Retrieval Through Intermediate Representations

arXiv.org Artificial Intelligence

Retrieval-augmented generation (RAG) encounters challenges when addressing complex queries, particularly multi-hop questions. While several methods tackle multi-hop queries by iteratively generating internal queries and retrieving external documents, these approaches are computationally expensive. In this paper, we identify a three-stage information processing pattern in LLMs during layer-by-layer reasoning, consisting of extraction, processing, and subsequent extraction steps. This observation suggests that the representations in intermediate layers contain richer information compared to those in other layers. Building on this insight, we propose Layer-wise RAG (L-RAG). Unlike prior methods that focus on generating new internal queries, L-RAG leverages intermediate representations from the middle layers, which capture next-hop information, to retrieve external knowledge. L-RAG achieves performance comparable to multi-step approaches while maintaining inference overhead similar to that of standard RAG. Experimental results show that L-RAG outperforms existing RAG methods on open-domain multi-hop question-answering datasets, including MuSiQue, HotpotQA, and 2WikiMultiHopQA. The code is available in https://anonymous.4open.science/r/L-RAG-ADD5/


Self-Clustering Graph Transformer Approach to Model Resting-State Functional Brain Activity

arXiv.org Artificial Intelligence

Resting-state functional magnetic resonance imaging (rs-fMRI) offers valuable insights into the human brain's functional organization and is a powerful tool for investigating the relationship between brain function and cognitive processes, as it allows for the functional organization of the brain to be captured without relying on a specific task or stimuli. In this study, we introduce a novel attention mechanism for graphs with subnetworks, named Self-Clustering Graph Transformer (SCGT), designed to handle the issue of uniform node updates in graph transformers. By using static functional connectivity (FC) correlation features as input to the transformer model, SCGT effectively captures the sub-network structure of the brain by performing cluster-specific updates to the nodes, unlike uniform node updates in vanilla graph transformers, further allowing us to learn and interpret the subclusters. We validate our approach on the Adolescent Brain Cognitive Development (ABCD) dataset, comprising 7,957 participants, for the prediction of total cognitive score and gender classification. Our results demonstrate that SCGT outperforms the vanilla graph transformer method and other recent models, offering a promising tool for modeling brain functional connectivity and interpreting the underlying subnetwork structures.


Speculative Prefill: Turbocharging TTFT with Lightweight and Training-Free Token Importance Estimation

arXiv.org Artificial Intelligence

Improving time-to-first-token (TTFT) is an essentially important objective in modern large language model (LLM) inference engines. Because optimizing TTFT directly results in higher maximal QPS and meets the requirements of many critical applications. However, boosting TTFT is notoriously challenging since it is purely compute-bounded and the performance bottleneck shifts from the self-attention to the MLP part. We present SpecPrefill, a training free framework that accelerates the inference TTFT for both long and medium context queries based on the following insight: LLMs are generalized enough to still preserve the quality given only a carefully chosen subset of prompt tokens. At its core, SpecPrefill leverages a lightweight model to speculate locally important tokens based on the context. These tokens, along with the necessary positional information, are then sent to the main model for processing. We evaluate SpecPrefill with a diverse set of tasks, followed by a comprehensive benchmarking of performance improvement both in a real end-to-end setting and ablation studies. SpecPrefill manages to serve Llama-3.1-405B-Instruct-FP8 with up to $7\times$ maximal end-to-end QPS on real downstream tasks and $7.66\times$ TTFT improvement during benchmarking.


PalmBench: A Comprehensive Benchmark of Compressed Large Language Models on Mobile Platforms

arXiv.org Artificial Intelligence

Deploying large language models (LLMs) locally on mobile devices is advantageous in scenarios where transmitting data to remote cloud servers is either undesirable due to privacy concerns or impractical due to network connection. Recent advancements (MLC, 2023a; Gerganov, 2023) have facilitated the local deployment of LLMs. However, local deployment also presents challenges, particularly in balancing quality (generative performance), latency, and throughput within the hardware constraints of mobile devices. In this paper, we introduce our lightweight, all-in-one automated benchmarking framework that allows users to evaluate LLMs on mobile devices. We provide a comprehensive benchmark of various popular LLMs with different quantization configurations (both weights and activations) across multiple mobile platforms with varying hardware capabilities. Unlike traditional benchmarks that assess full-scale models on high-end GPU clusters, we focus on evaluating resource efficiency (memory and power consumption) and harmful output for compressed models on mobile devices. Our key observations include i) differences in energy efficiency and throughput across mobile platforms; ii) the impact of quantization on memory usage, GPU execution time, and power consumption; and iii) accuracy and performance degradation of quantized models compared to their non-quantized counterparts; and iv) the frequency of hallucinations and toxic content generated by compressed LLMs on mobile devices.


ECGN: A Cluster-Aware Approach to Graph Neural Networks for Imbalanced Classification

arXiv.org Artificial Intelligence

Classifying nodes in a graph is a common problem. The ideal classifier must adapt to any imbalances in the class distribution. It must also use information in the clustering structure of real-world graphs. Existing Graph Neural Networks (GNNs) have not addressed both problems together. We propose the Enhanced Cluster-aware Graph Network (ECGN), a novel method that addresses these issues by integrating cluster-specific training with synthetic node generation. Unlike traditional GNNs that apply the same node update process for all nodes, ECGN learns different aggregations for different clusters. We also use the clusters to generate new minority-class nodes in a way that helps clarify the inter-class decision boundary. By combining cluster-aware embeddings with a global integration step, ECGN enhances the quality of the resulting node embeddings. Our method works with any underlying GNN and any cluster generation technique. Experimental results show that ECGN consistently outperforms its closest competitors by up to 11% on some widely studied benchmark datasets.


EITNet: An IoT-Enhanced Framework for Real-Time Basketball Action Recognition

arXiv.org Artificial Intelligence

Integrating IoT technology into basketball action recognition enhances sports analytics, providing crucial insights into player performance and game strategy. However, existing methods often fall short in terms of accuracy and efficiency, particularly in complex, real-time environments where player movements are frequently occluded or involve intricate interactions. To overcome these challenges, we propose the EITNet model, a deep learning framework that combines EfficientDet for object detection, I3D for spatiotemporal feature extraction, and TimeSformer for temporal analysis, all integrated with IoT technology for seamless real-time data collection and processing. Our contributions include developing a robust architecture that improves recognition accuracy to 92\%, surpassing the baseline EfficientDet model's 87\%, and reducing loss to below 5.0 compared to EfficientDet's 9.0 over 50 epochs. Furthermore, the integration of IoT technology enhances real-time data processing, providing adaptive insights into player performance and strategy. The paper details the design and implementation of EITNet, experimental validation, and a comprehensive evaluation against existing models. The results demonstrate EITNet's potential to significantly advance automated sports analysis and optimize data utilization for player performance and strategy improvement.


How Much Can RAG Help the Reasoning of LLM?

arXiv.org Artificial Intelligence

Retrieval-Augmented Generation (RAG) has gained significant popularity in modern Large Language Models (LLMs) due to its effectiveness in introducing new knowledge and reducing hallucinations. However, the deep understanding of RAG remains limited, how does RAG help the reasoning process and can RAG help improve the reasoning capability remains question. While external documents are typically considered as a method to incorporate domain-specific information, they also contain intermediate reasoning results related to the query, this suggests that documents could enhance the reasoning capability of LLMs, which has not been previously explored. In this paper, we investigate this issue in depth and find that while RAG can assist with reasoning, the help is limited. If we conceptualize the reasoning process as a tree with fixed depth, then RAG struggles to assist LLMs in performing deeper reasoning. Additionally, the information in the documents requires preprocessing to filter out noise. We demonstrate that this preprocessing is difficult to achieve simply fine-tuning of the LLM, it often necessitates numerous additional transformer layers to solve the problem. To simplify the problem, we propose DPrompt tuning, which effectively resolves the issue within just limited transformer layers, leading to improved performance. Large Language Models (LLMs) (Brown, 2020) have demonstrated remarkable capabilities across a variety of tasks, including text generation and question answering (Ouyang et al., 2022; Wei et al., 2022), code generation (Gu, 2023), and information retrieval (Dai et al., 2024). However, current LLMs often suffer from serious hallucinations (Huang et al., 2023) due to a lack of factual information. Moreover, the knowledge embedded within LLMs is encoded in their parameters (Yang et al., 2024), meaning that incorporating new knowledge requires further fine-tuning, which is both time-consuming and resource-intensive.


DSAM: A Deep Learning Framework for Analyzing Temporal and Spatial Dynamics in Brain Networks

arXiv.org Artificial Intelligence

Resting-state functional magnetic resonance imaging (rs-fMRI) is a noninvasive technique pivotal for understanding human neural mechanisms of intricate cognitive processes. Most rs-fMRI studies compute a single static functional connectivity matrix across brain regions of interest, or dynamic functional connectivity matrices with a sliding window approach. These approaches are at risk of oversimplifying brain dynamics and lack proper consideration of the goal at hand. While deep learning has gained substantial popularity for modeling complex relational data, its application to uncovering the spatiotemporal dynamics of the brain is still limited. We propose a novel interpretable deep learning framework that learns goal-specific functional connectivity matrix directly from time series and employs a specialized graph neural network for the final classification. Our model, DSAM, leverages temporal causal convolutional networks to capture the temporal dynamics in both low- and high-level feature representations, a temporal attention unit to identify important time points, a self-attention unit to construct the goal-specific connectivity matrix, and a novel variant of graph neural network to capture the spatial dynamics for downstream classification. To validate our approach, we conducted experiments on the Human Connectome Project dataset with 1075 samples to build and interpret the model for the classification of sex group, and the Adolescent Brain Cognitive Development Dataset with 8520 samples for independent testing. Compared our proposed framework with other state-of-art models, results suggested this novel approach goes beyond the assumption of a fixed connectivity matrix and provides evidence of goal-specific brain connectivity patterns, which opens up the potential to gain deeper insights into how the human brain adapts its functional connectivity specific to the task at hand.


How Far Are We From AGI

arXiv.org Artificial Intelligence

The evolution of artificial intelligence (AI) has profoundly impacted human society, driving significant advancements in multiple sectors. Yet, the escalating demands on AI have highlighted the limitations of AI's current offerings, catalyzing a movement towards Artificial General Intelligence (AGI). AGI, distinguished by its ability to execute diverse real-world tasks with efficiency and effectiveness comparable to human intelligence, reflects a paramount milestone in AI evolution. While existing works have summarized specific recent advancements of AI, they lack a comprehensive discussion of AGI's definitions, goals, and developmental trajectories. Different from existing survey papers, this paper delves into the pivotal questions of our proximity to AGI and the strategies necessary for its realization through extensive surveys, discussions, and original perspectives. We start by articulating the requisite capability frameworks for AGI, integrating the internal, interface, and system dimensions. As the realization of AGI requires more advanced capabilities and adherence to stringent constraints, we further discuss necessary AGI alignment technologies to harmonize these factors. Notably, we emphasize the importance of approaching AGI responsibly by first defining the key levels of AGI progression, followed by the evaluation framework that situates the status-quo, and finally giving our roadmap of how to reach the pinnacle of AGI. Moreover, to give tangible insights into the ubiquitous impact of the integration of AI, we outline existing challenges and potential pathways toward AGI in multiple domains. In sum, serving as a pioneering exploration into the current state and future trajectory of AGI, this paper aims to foster a collective comprehension and catalyze broader public discussions among researchers and practitioners on AGI.