Goto

Collaborating Authors

 Li, Ao


MRS: A Fast Sampler for Mean Reverting Diffusion based on ODE and SDE Solvers

arXiv.org Artificial Intelligence

In applications of diffusion models, controllable generation is of practical significance, but is also challenging. Current methods for controllable generation primarily focus on modifying the score function of diffusion models, while Mean Reverting (MR) Diffusion directly modifies the structure of the stochastic differential equation (SDE), making the incorporation of image conditions simpler and more natural. However, current training-free fast samplers are not directly applicable to MR Diffusion. And thus MR Diffusion requires hundreds of NFEs (number of function evaluations) to obtain high-quality samples. In this paper, we propose a new algorithm named MRS (MR Sampler) to reduce the sampling NFEs of MR Diffusion. We solve the reverse-time SDE and the probability flow ordinary differential equation (PF-ODE) associated with MR Diffusion, and derive semi-analytical solutions. The solutions consist of an analytical function and an integral parameterized by a neural network. Based on this solution, we can generate high-quality samples in fewer steps. Our approach does not require training and supports all mainstream parameterizations, including noise prediction, data prediction and velocity prediction. Extensive experiments demonstrate that MR Sampler maintains high sampling quality with a speedup of 10 to 20 times across ten different image restoration tasks. Our algorithm accelerates the sampling procedure of MR Diffusion, making it more practical in controllable generation.


Enhancing Visual Inspection Capability of Multi-Modal Large Language Models on Medical Time Series with Supportive Conformalized and Interpretable Small Specialized Models

arXiv.org Artificial Intelligence

Large language models (LLMs) exhibit remarkable capabilities in visual inspection of medical time-series data, achieving proficiency comparable to human clinicians. However, their broad scope limits domain-specific precision, and proprietary weights hinder fine-tuning for specialized datasets. In contrast, small specialized models (SSMs) excel in targeted tasks but lack the contextual reasoning required for complex clinical decision-making. To address these challenges, we propose ConMIL (Conformalized Multiple Instance Learning), a decision-support SSM that integrates seamlessly with LLMs. By using Multiple Instance Learning (MIL) to identify clinically significant signal segments and conformal prediction for calibrated set-valued outputs, ConMIL enhances LLMs' interpretative capabilities for medical time-series analysis. Experimental results demonstrate that ConMIL significantly improves the performance of state-of-the-art LLMs, such as ChatGPT4.0 and Qwen2-VL-7B. Specifically, \ConMIL{}-supported Qwen2-VL-7B achieves 94.92% and 96.82% precision for confident samples in arrhythmia detection and sleep staging, compared to standalone LLM accuracy of 46.13% and 13.16%. These findings highlight the potential of ConMIL to bridge task-specific precision and broader contextual reasoning, enabling more reliable and interpretable AI-driven clinical decision support.


M2SE: A Multistage Multitask Instruction Tuning Strategy for Unified Sentiment and Emotion Analysis

arXiv.org Artificial Intelligence

Sentiment analysis and emotion recognition are crucial for applications such as human-computer interaction and depression detection. Traditional unimodal methods often fail to capture the complexity of emotional expressions due to conflicting signals from different modalities. Current Multimodal Large Language Models (MLLMs) also face challenges in detecting subtle facial expressions and addressing a wide range of emotion-related tasks. To tackle these issues, we propose M2SE, a Multistage Multitask Sentiment and Emotion Instruction Tuning Strategy for general-purpose MLLMs. It employs a combined approach to train models on tasks such as multimodal sentiment analysis, emotion recognition, facial expression recognition, emotion reason inference, and emotion cause-pair extraction. We also introduce the Emotion Multitask dataset (EMT), a custom dataset that supports these five tasks. Our model, Emotion Universe (EmoVerse), is built on a basic MLLM framework without modifications, yet it achieves substantial improvements across these tasks when trained with the M2SE strategy. Extensive experiments demonstrate that EmoVerse outperforms existing methods, achieving state-of-the-art results in sentiment and emotion tasks. These results highlight the effectiveness of M2SE in enhancing multimodal emotion perception. The dataset and code are available at https://github.com/xiaoyaoxinyi/M2SE.


Agent-Oriented Planning in Multi-Agent Systems

arXiv.org Artificial Intelligence

Through the collaboration of multiple agents possessing diverse expertise and tools, multi-agent systems achieve impressive progress in solving real-world problems. Given the user queries, the meta-agents, serving as the brain within these systems, are required to decompose the queries into multiple sub-tasks that can be allocated to suitable agents capable of solving them, so-called agent-oriented planning. In this study, we identify three critical design principles of agent-oriented planning, including solvability, completeness, and non-redundancy, to ensure that each sub-task is effectively resolved, leading to satisfactory responses to the original queries. These principles further inspire us to propose a novel framework for agent-oriented planning in multi-agent systems, leveraging a fast task decomposition and allocation process followed by an effective and efficient evaluation via a reward model. During the planning process, the meta-agent is also responsible for evaluating the performance of the expert agents, making timely adjustments to the sub-tasks and scheduling as necessary. Besides, we integrate a feedback loop into the proposed framework to further enhance the effectiveness and robustness of such a problem-solving process. Extensive experiments demonstrate the advancement of the proposed framework in solving real-world problems compared to both single-agent systems and existing planning strategies for multi-agent systems.


Deep Inverse Design for High-Level Synthesis

arXiv.org Artificial Intelligence

High-level synthesis (HLS) has significantly advanced the automation of digital circuits design, yet the need for expertise and time in pragma tuning remains challenging. Existing solutions for the design space exploration (DSE) adopt either heuristic methods, lacking essential information for further optimization potential, or predictive models, missing sufficient generalization due to the time-consuming nature of HLS and the exponential growth of the design space. To address these challenges, we propose Deep Inverse Design for HLS (DID4HLS), a novel approach that integrates graph neural networks and generative models. DID4HLS iteratively optimizes hardware designs aimed at compute-intensive algorithms by learning conditional distributions of design features from post-HLS data. Compared to four state-of-the-art DSE baselines, our method achieved an average improvement of 42.5% on average distance to reference set (ADRS) compared to the best-performing baselines across six benchmarks, while demonstrating high robustness and efficiency.


Heterophilous Distribution Propagation for Graph Neural Networks

arXiv.org Artificial Intelligence

Graph Neural Networks (GNNs) have achieved remarkable success in various graph mining tasks by aggregating information from neighborhoods for representation learning. The success relies on the homophily assumption that nearby nodes exhibit similar behaviors, while it may be violated in many real-world graphs. Recently, heterophilous graph neural networks (HeterGNNs) have attracted increasing attention by modifying the neural message passing schema for heterophilous neighborhoods. However, they suffer from insufficient neighborhood partition and heterophily modeling, both of which are critical but challenging to break through. To tackle these challenges, in this paper, we propose heterophilous distribution propagation (HDP) for graph neural networks. Instead of aggregating information from all neighborhoods, HDP adaptively separates the neighbors into homophilous and heterphilous parts based on the pseudo assignments during training. The heterophilous neighborhood distribution is learned with orthogonality-oriented constraint via a trusted prototype contrastive learning paradigm. Both the homophilous and heterophilous patterns are propagated with a novel semantic-aware message passing mechanism. We conduct extensive experiments on 9 benchmark datasets with different levels of homophily. Experimental results show that our method outperforms representative baselines on heterophilous datasets.


MTS-LOF: Medical Time-Series Representation Learning via Occlusion-Invariant Features

arXiv.org Artificial Intelligence

Medical time series data are indispensable in healthcare, providing critical insights for disease diagnosis, treatment planning, and patient management. The exponential growth in data complexity, driven by advanced sensor technologies, has presented challenges related to data labeling. Self-supervised learning (SSL) has emerged as a transformative approach to address these challenges, eliminating the need for extensive human annotation. In this study, we introduce a novel framework for Medical Time Series Representation Learning, known as MTS-LOF. MTS-LOF leverages the strengths of contrastive learning and Masked Autoencoder (MAE) methods, offering a unique approach to representation learning for medical time series data. By combining these techniques, MTS-LOF enhances the potential of healthcare applications by providing more sophisticated, context-rich representations. Additionally, MTS-LOF employs a multi-masking strategy to facilitate occlusion-invariant feature learning. This approach allows the model to create multiple views of the data by masking portions of it. By minimizing the discrepancy between the representations of these masked patches and the fully visible patches, MTS-LOF learns to capture rich contextual information within medical time series datasets. The results of experiments conducted on diverse medical time series datasets demonstrate the superiority of MTS-LOF over other methods. These findings hold promise for significantly enhancing healthcare applications by improving representation learning. Furthermore, our work delves into the integration of joint-embedding SSL and MAE techniques, shedding light on the intricate interplay between temporal and structural dependencies in healthcare data. This understanding is crucial, as it allows us to grasp the complexities of healthcare data analysis.


Knowledge Distillation Under Ideal Joint Classifier Assumption

arXiv.org Artificial Intelligence

Knowledge distillation constitutes a potent methodology for condensing substantial neural networks into more compact and efficient counterparts. Within this context, softmax regression representation learning serves as a widely embraced approach, leveraging a pre-established teacher network to guide the learning process of a diminutive student network. Notably, despite the extensive inquiry into the efficacy of softmax regression representation learning, the intricate underpinnings governing the knowledge transfer mechanism remain inadequately elucidated. This study introduces the 'Ideal Joint Classifier Knowledge Distillation' (IJCKD) framework, an overarching paradigm that not only furnishes a lucid and exhaustive comprehension of prevailing knowledge distillation techniques but also establishes a theoretical underpinning for prospective investigations. Employing mathematical methodologies derived from domain adaptation theory, this investigation conducts a comprehensive examination of the error boundary of the student network contingent upon the teacher network. Consequently, our framework facilitates efficient knowledge transference between teacher and student networks, thereby accommodating a diverse spectrum of applications.


Contrastive Continual Multi-view Clustering with Filtered Structural Fusion

arXiv.org Artificial Intelligence

Multi-view clustering thrives in applications where views are collected in advance by extracting consistent and complementary information among views. However, it overlooks scenarios where data views are collected sequentially, i.e., real-time data. Due to privacy issues or memory burden, previous views are not available with time in these situations. Some methods are proposed to handle it but are trapped in a stability-plasticity dilemma. In specific, these methods undergo a catastrophic forgetting of prior knowledge when a new view is attained. Such a catastrophic forgetting problem (CFP) would cause the consistent and complementary information hard to get and affect the clustering performance. To tackle this, we propose a novel method termed Contrastive Continual Multi-view Clustering with Filtered Structural Fusion (CCMVC-FSF). Precisely, considering that data correlations play a vital role in clustering and prior knowledge ought to guide the clustering process of a new view, we develop a data buffer with fixed size to store filtered structural information and utilize it to guide the generation of a robust partition matrix via contrastive learning. Furthermore, we theoretically connect CCMVC-FSF with semi-supervised learning and knowledge distillation. Extensive experiments exhibit the excellence of the proposed method.


TDSTF: Transformer-based Diffusion probabilistic model for Sparse Time series Forecasting

arXiv.org Artificial Intelligence

Background and objective: In the intensive care unit (ICU), vital sign monitoring is critical, and an accurate predictive system is required. This study will create a novel model to forecast Heart Rate (HR), Systolic Blood Pressure (SBP), and Diastolic Blood Pressure (DBP) in ICU. These vital signs are crucial for prompt interventions for patients. We extracted $24,886$ ICU stays from the MIMIC-III database, which contains data from over $46$ thousand patients, to train and test the model. Methods: The model proposed in this study, areansformerin intensive careabilistic Model for Sparse Time Series Forecasting (TDSTF), uses a deep learning technique called the Transformer. The TDSTF model showed state-of-the-art performance in predicting vital signs in the ICU, outperforming other models' ability to predict distributions of vital signs and being more computationally efficient. The code is available at https://github.com/PingChang818/TDSTF. Results: The results of the study showed that TDSTF achieved a Normalized Average Continuous Ranked Probability Score (NACRPS) of $0.4438$ and a Mean Squared Error (MSE) of $0.4168$, an improvement of $18.9\%$ and $34.3\%$ over the best baseline model, respectively. Conclusion: In conclusion, TDSTF is an effective and efficient solution for forecasting vital signs in the ICU, and it shows a significant improvement compared to other models in the field.