Zhang, Yuxuan
DLP-LoRA: Efficient Task-Specific LoRA Fusion with a Dynamic, Lightweight Plugin for Large Language Models
Zhang, Yuxuan, Li, Ruizhe
Recent advancements in Large Language Models (LLMs) have achieved robust performance across diverse tasks, but fine-tuning these models for specific domains remains resource-intensive. Parameter-Efficient Fine-Tuning (PEFT) methods like Low-Rank Adaptation (LoRA) address this challenge by fine-tuning a small subset of parameters. However, existing methods for fusing multiple LoRAs lack dynamic fusion based on contextual inputs and often increase inference time due to token-level operations. We propose DLP-LoRA, a Dynamic Lightweight Plugin that employs a mini-MLP module with only 5M parameters to dynamically fuse multiple LoRAs at the sentence level using top-p sampling strategies. This approach reduces inference time to less than twice that of single LoRA inference by leveraging parallel computation. Evaluations across 26 tasks--including multiple-choice questions and question answering--demonstrate that DLP-LoRA achieves an average accuracy of 92.34% on multiple-choice datasets and significant improvements in BLEU and ROUGE scores on QA datasets, outperforming different LLMs backbones under composite task settings. DLP-LoRA effectively balances performance and efficiency, making it a practical solution for dynamic multi-task adaptation in LLMs. Recent advancements in Large Language Models (LLMs) such as LLaMA 3.1 (Dubey et al., 2024), Qwen 2.5 (Team, 2024), and Gemma 2 (Team et al., 2024) have led to robust and superior performance across multiple benchmarks (Muennighoff et al., 2022; Ilyas Moutawwakil, 2023; Fourrier et al., 2024).
RmGPT: Rotating Machinery Generative Pretrained Model
Wang, Yilin, Yu, Yifei, Sun, Kong, Lei, Peixuan, Zhang, Yuxuan, Zio, Enrico, Xia, Aiguo, Li, Yuanxiang
In industry, the reliability of rotating machinery is critical for production efficiency and safety. Current methods of Prognostics and Health Management (PHM) often rely on task-specific models, which face significant challenges in handling diverse datasets with varying signal characteristics, fault modes and operating conditions. Inspired by advancements in generative pretrained models, we propose RmGPT, a unified model for diagnosis and prognosis tasks. RmGPT introduces a novel token-based framework, incorporating Signal Tokens, Prompt Tokens, Time-Frequency Task Tokens and Fault Tokens to handle heterogeneous data within a unified model architecture. We leverage self-supervised learning for robust feature extraction and introduce a next signal token prediction pretraining strategy, alongside efficient prompt learning for task-specific adaptation. Extensive experiments demonstrate that RmGPT significantly outperforms state-of-the-art algorithms, achieving near-perfect accuracy in diagnosis tasks and exceptionally low errors in prognosis tasks. Notably, RmGPT excels in few-shot learning scenarios, achieving 92% accuracy in 16-class one-shot experiments, highlighting its adaptability and robustness. This work establishes RmGPT as a powerful PHM foundation model for rotating machinery, advancing the scalability and generalizability of PHM solutions.
Scalable quantum dynamics compilation via quantum machine learning
Zhang, Yuxuan, Wiersema, Roeland, Carrasquilla, Juan, Cincio, Lukasz, Kim, Yong Baek
Quantum dynamics compilation is an important task for improving quantum simulation efficiency: It aims to synthesize multi-qubit target dynamics into a circuit consisting of as few elementary gates as possible. Compared to deterministic methods such as Trotterization, variational quantum compilation (VQC) methods employ variational optimization to reduce gate costs while maintaining high accuracy. In this work, we explore the potential of a VQC scheme by making use of out-of-distribution generalization results in quantum machine learning (QML): By learning the action of a given many-body dynamics on a small data set of product states, we can obtain a unitary circuit that generalizes to highly entangled states such as the Haar random states. The efficiency in training allows us to use tensor network methods to compress such time-evolved product states by exploiting their low entanglement features. Our approach exceeds state-of-the-art compilation results in both system size and accuracy in one dimension ($1$D). For the first time, we extend VQC to systems on two-dimensional (2D) strips with a quasi-1D treatment, demonstrating a significant resource advantage over standard Trotterization methods, highlighting the method's promise for advancing quantum simulation tasks on near-term quantum processors.
ProcessPainter: Learn Painting Process from Sequence Data
Song, Yiren, Huang, Shijie, Yao, Chen, Ye, Xiaojun, Ci, Hai, Liu, Jiaming, Zhang, Yuxuan, Shou, Mike Zheng
The painting process of artists is inherently stepwise and varies significantly among different painters and styles. Generating detailed, step-by-step painting processes is essential for art education and research, yet remains largely underexplored. Traditional stroke-based rendering methods break down images into sequences of brushstrokes, yet they fall short of replicating the authentic processes of artists, with limitations confined to basic brushstroke modifications. Text-to-image models utilizing diffusion processes generate images through iterative denoising, also diverge substantially from artists' painting process. To address these challenges, we introduce ProcessPainter, a text-to-video model that is initially pre-trained on synthetic data and subsequently fine-tuned with a select set of artists' painting sequences using the LoRA model. This approach successfully generates painting processes from text prompts for the first time. Furthermore, we introduce an Artwork Replication Network capable of accepting arbitrary-frame input, which facilitates the controlled generation of painting processes, decomposing images into painting sequences, and completing semi-finished artworks. This paper offers new perspectives and tools for advancing art education and image generation technology.
Solution for Point Tracking Task of ICCV 1st Perception Test Challenge 2023
Pan, Hongpeng, Yang, Yang, Fu, Zhongtian, Zhang, Yuxuan, Du, Shian, Xu, Yi, Ji, Xiangyang
This report proposes an improved method for the Tracking Any Point (TAP) task, which tracks any physical surface through a video. Several existing approaches have explored the TAP by considering the temporal relationships to obtain smooth point motion trajectories, however, they still suffer from the cumulative error caused by temporal prediction. To address this issue, we propose a simple yet effective approach called TAP with confident static points (TAPIR+), which focuses on rectifying the tracking of the static point in the videos shot by a static camera. To clarify, our approach contains two key components: (1) Multi-granularity Camera Motion Detection, which could identify the video sequence by the static camera shot. (2) CMR-based point trajectory prediction with one moving object segmentation approach to isolate the static point from the moving object. Our approach ranked first in the final test with a score of 0.46.
DPP-based Client Selection for Federated Learning with Non-IID Data
Zhang, Yuxuan, Xu, Chao, Yang, Howard H., Wang, Xijun, Quek, Tony Q. S.
To improve the performance of FL on non-IID data, various FL algorithms have been proposed in a line of recent work [10-17], This paper proposes a client selection (CS) method to tackle the which can be broadly divided into two categories. Particularly, the communication bottleneck of federated learning (FL) while concurrently first group of work aims to reduce the weight divergence of local coping with FL's data heterogeneity issue. Specifically, we models by modifying the data distributions at clients via data sharing first analyze the effect of CS in FL and show that FL training can [10, 11] or data augmentation [12, 13]. However, it requires the be accelerated by adequately choosing participants to diversify the clients to share their private datasets, thereby increasing the risk of training dataset in each round of training. Based on this, we leverage privacy leakage and incurring extra communication costs. To this data profiling and determinantal point process (DPP) sampling end, instead of changing individual clients' local datasets, another techniques to develop an algorithm termed Federated Learning with line of work [14-17] focuses on improving the training performance DPP-based Participant Selection (FL-DP
Video4MRI: An Empirical Study on Brain Magnetic Resonance Image Analytics with CNN-based Video Classification Frameworks
Zhang, Yuxuan, Wang, Qingzhong, Bian, Jiang, Liu, Yi, Xu, Yanwu, Dou, Dejing, Xiong, Haoyi
To address the problem of medical image recognition, computer vision techniques like convolutional neural networks (CNN) are frequently used. Recently, 3D CNN-based models dominate the field of magnetic resonance image (MRI) analytics. Due to the high similarity between MRI data and videos, we conduct extensive empirical studies on video recognition techniques for MRI classification to answer the questions: (1) can we directly use video recognition models for MRI classification, (2) which model is more appropriate for MRI, (3) are the common tricks like data augmentation in video recognition still useful for MRI classification? Our work suggests that advanced video techniques benefit MRI classification. In this paper, four datasets of Alzheimer's and Parkinson's disease recognition are utilized in experiments, together with three alternative video recognition models and data augmentation techniques that are frequently applied to video tasks. In terms of efficiency, the results reveal that the video framework performs better than 3D-CNN models by 5% - 11% with 50% - 66% less trainable parameters. This report pushes forward the potential fusion of 3D medical imaging and video understanding research.
CelebHair: A New Large-Scale Dataset for Hairstyle Recommendation based on CelebA
Chen, Yutao, Zhang, Yuxuan, Huang, Zhongrui, Luo, Zhenyao, Chen, Jinpeng
In this paper, we present a new large-scale dataset for hairstyle recommendation, CelebHair, based on the celebrity facial attributes dataset, CelebA. Our dataset inherited the majority of facial images along with some beauty-related facial attributes from CelebA. Additionally, we employed facial landmark detection techniques to extract extra features such as nose length and pupillary distance, and deep convolutional neural networks for face shape and hairstyle classification. Empirical comparison has demonstrated the superiority of our dataset to other existing hairstyle-related datasets regarding variety, veracity, and volume. Analysis and experiments have been conducted on the dataset in order to evaluate its robustness and usability.