Ma, Jian
SplitFrozen: Split Learning with Device-side Model Frozen for Fine-Tuning LLM on Heterogeneous Resource-Constrained Devices
Ma, Jian, Lyu, Xinchen, Jiang, Jun, Cui, Qimei, Yao, Haipeng, Tao, Xiaofeng
Fine-tuning large language models (LLMs) on private, on-device data can empower tailored personalized AI agents. However, fine-tuning LLMs on resource-constrained edge devices faces significant challenges, including excessive computation overhead, device heterogeneity, and data imbalance. This paper proposes SplitFrozen, a split learning framework that enables efficient LLM fine-tuning by strategically freezing device-side model layers while centralizing parameter-efficient fine-tuning on the server. Our framework partitions LLMs into device-side frozen layers and server-side fine-tuning layers, where heterogeneous resource-constrained devices execute only forward propagation. To minimize server-side training costs, we integrate Low-Rank Adaptation (LoRA) into the server-side layers. A pipeline parallelism strategy further optimizes training efficiency by decoupling device-server computations and leveraging decomposed backward propagation. Experiments on GPT-2 with the MRPC, MNLI-matched, and SST-2 datasets demonstrate that SplitFrozen outperforms FedLoRA and SplitLoRA by 69.4\% model accuracy under extremely imbalanced data, while reducing up to 86.8\% device-side computations and 50.2\% total training time. Experiments also validate the scalability of SplitFrozen on content generation task using Llama-3.2 model on GSM8K dataset.
Facies Classification with Copula Entropy
Ma, Jian
Facies are the type of rocks with similar characteristics given by geologists and facies classification is of very significance in geological tasks, such as formation evaluation, reservoir characterization. As the geological data accumulates, there are growing interests in facies classification with machine learning methods [1, 2, 3, 4, 5, 6, 7, 8, 9]. There are two issues with the existing works on facies classification. First, the machine learning models are built without variable selection or with only very primary method, such as cross-validation, which makes the classifiers with useless variable as inputs and therefore with low performance. Second, most of the models for facies classification are block-box, such as deep learning [5, 10, 11], Boostings or SVMs[7], which are un-interpretable to geologists. Variable selection is a common task that selects a subset from all the available variables for machine learning models. By this, the accuracy of the predictive models built with the selected variables can be improved compared with those built without selection. The traditional method for variable selection are mainly based on likelihoods, such as AIC, BIC, or accuracy, such as LASSO [12], or correlation, such as HSIC [13], distance correlation [14], and copula entropy [15]. Copula Entropy (CE) is a recently proposed rigorous mathematical concept for measuring multivariate statistical independence and is proved to be equivalent to mutual information in information theory [16].
Mutual Learning for Acoustic Matching and Dereverberation via Visual Scene-driven Diffusion
Ma, Jian, Wang, Wenguan, Yang, Yi, Zheng, Feng
Visual acoustic matching (VAM) is pivotal for enhancing the immersive experience, and the task of dereverberation is effective in improving audio intelligibility. Existing methods treat each task independently, overlooking the inherent reciprocity between them. Moreover, these methods depend on paired training data, which is challenging to acquire, impeding the utilization of extensive unpaired data. In this paper, we introduce MVSD, a mutual learning framework based on diffusion models. MVSD considers the two tasks symmetrically, exploiting the reciprocal relationship to facilitate learning from inverse tasks and overcome data scarcity. Furthermore, we employ the diffusion model as foundational conditional converters to circumvent the training instability and over-smoothing drawbacks of conventional GAN architectures. Specifically, MVSD employs two converters: one for VAM called reverberator and one for dereverberation called dereverberator. The dereverberator judges whether the reverberation audio generated by reverberator sounds like being in the conditional visual scenario, and vice versa. By forming a closed loop, these two converters can generate informative feedback signals to optimize the inverse tasks, even with easily acquired one-way unpaired data. Extensive experiments on two standard benchmarks, i.e., SoundSpaces-Speech and Acoustic AVSpeech, exhibit that our framework can improve the performance of the reverberator and dereverberator and better match specified visual scenarios.
An Outline of Prognostics and Health Management Large Model: Concepts, Paradigms, and Challenges
Tao, Laifa, Li, Shangyu, Liu, Haifei, Huang, Qixuan, Ma, Liang, Ning, Guoao, Chen, Yiling, Wu, Yunlong, Li, Bin, Zhang, Weiwei, Zhao, Zhengduo, Zhan, Wenchao, Cao, Wenyan, Wang, Chao, Liu, Hongmei, Ma, Jian, Suo, Mingliang, Cheng, Yujie, Ding, Yu, Song, Dengwei, Lu, Chen
Prognosis and Health Management (PHM), critical for ensuring task completion by complex systems and preventing unexpected failures, is widely adopted in aerospace, manufacturing, maritime, rail, energy, etc. However, PHM's development is constrained by bottlenecks like generalization, interpretation and verification abilities. Presently, generative artificial intelligence (AI), represented by Large Model, heralds a technological revolution with the potential to fundamentally reshape traditional technological fields and human production methods. Its capabilities, including strong generalization, reasoning, and generative attributes, present opportunities to address PHM's bottlenecks. To this end, based on a systematic analysis of the current challenges and bottlenecks in PHM, as well as the research status and advantages of Large Model, we propose a novel concept and three progressive paradigms of Prognosis and Health Management Large Model (PHM-LM) through the integration of the Large Model with PHM. Subsequently, we provide feasible technical approaches for PHM-LM to bolster PHM's core capabilities within the framework of the three paradigms. Moreover, to address core issues confronting PHM, we discuss a series of technical challenges of PHM-LM throughout the entire process of construction and application. This comprehensive effort offers a holistic PHM-LM technical framework, and provides avenues for new PHM technologies, methodologies, tools, platforms and applications, which also potentially innovates design, research & development, verification and application mode of PHM. And furthermore, a new generation of PHM with AI will also capably be realized, i.e., from custom to generalized, from discriminative to generative, and from theoretical conditions to practical applications.
Your decision path does matter in pre-training industrial recommenders with multi-source behaviors
Gan, Chunjing, Hu, Binbin, Huang, Bo, Liu, Ziqi, Ma, Jian, Zhang, Zhiqiang, Zhong, Wenliang, Zhou, Jun
Online service platforms offering a wide range of services through miniapps have become crucial for users who visit these platforms with clear intentions to find services they are interested in. Aiming at effective content delivery, cross-domain recommendation are introduced to learn high-quality representations by transferring behaviors from data-rich scenarios. However, these methods overlook the impact of the decision path that users take when conduct behaviors, that is, users ultimately exhibit different behaviors based on various intents. To this end, we propose HIER, a novel Hierarchical decIsion path Enhanced Representation learning for cross-domain recommendation. With the help of graph neural networks for high-order topological information of the knowledge graph between multi-source behaviors, we further adaptively learn decision paths through well-designed exemplar-level and information bottleneck based contrastive learning. Extensive experiments in online and offline environments show the superiority of HIER.
Change Point Detection with Copula Entropy based Two-Sample Test
Ma, Jian
Change point detection is a typical task that aim to find changes in time series and can be tackled with two-sample test. Copula Entropy is a mathematical concept for measuring statistical independence and a two-sample test based on it was introduced recently. In this paper we propose a nonparametric multivariate method for multiple change point detection with the copula entropy-based two-sample test. The single change point detection is first proposed as a group of two-sample tests on every points of time series data and the change point is considered as with the maximum of the test statistics. The multiple change point detection is then proposed by combining the single change point detection method with binary segmentation strategy. We verified the effectiveness of our method and compared it with the other similar methods on the simulated univariate and multivariate data and the Nile data.
End-to-end Learnable Clustering for Intent Learning in Recommendation
Liu, Yue, Zhu, Shihao, Xia, Jun, Ma, Yingwei, Ma, Jian, Zhong, Wenliang, Liu, Xinwang, Zhang, Guannan, Zhang, Kejun
Intent learning, which aims to learn users' intents for user understanding and item recommendation, has become a hot research spot in recent years. However, the existing methods suffer from complex and cumbersome alternating optimization, limiting the performance and scalability. To this end, we propose a novel intent learning method termed \underline{ELCRec}, by unifying behavior representation learning into an \underline{E}nd-to-end \underline{L}earnable \underline{C}lustering framework, for effective and efficient \underline{Rec}ommendation. Concretely, we encode users' behavior sequences and initialize the cluster centers (latent intents) as learnable neurons. Then, we design a novel learnable clustering module to separate different cluster centers, thus decoupling users' complex intents. Meanwhile, it guides the network to learn intents from behaviors by forcing behavior embeddings close to cluster centers. This allows simultaneous optimization of recommendation and clustering via mini-batch data. Moreover, we propose intent-assisted contrastive learning by using cluster centers as self-supervision signals, further enhancing mutual promotion. Both experimental results and theoretical analyses demonstrate the superiority of ELCRec from six perspectives. Compared to the runner-up, ELCRec improves NDCG@5 by 8.9\% and reduces computational costs by 22.5\% on Beauty dataset. Furthermore, due to the scalability and universal applicability, we deploy this method on the industrial recommendation system with 130 million page views and achieve promising results.
Root Cause Analysis on Energy Efficiency with Transfer Entropy Flow
Ma, Jian
Energy efficiency is a big concern in industrial sectors. Finding the root cause of anomaly state of energy efficiency can help to improve energy efficiency of industrial systems and therefore save energy cost. In this research, we propose to use transfer entropy (TE) for root cause analysis on energy efficiency of industrial systems. A method, called TE flow, is proposed in that a TE flow from physical measurements of each subsystem to the energy efficiency indicator along timeline is considered as causal strength for diagnosing root cause of anomaly states of energy efficiency of a system. The copula entropy-based nonparametric TE estimator is used in the proposed method. We conducted experiments on real data collected from a compressing air system to verify the proposed method. Experimental results show that the TE flow method successfully identified the root cause of the energy (in)efficiency of the system.
AEGIS-Net: Attention-guided Multi-Level Feature Aggregation for Indoor Place Recognition
Ming, Yuhang, Ma, Jian, Yang, Xingrui, Dai, Weichen, Peng, Yong, Kong, Wanzeng
We present AEGIS-Net, a novel indoor place recognition model that takes in RGB point clouds and generates global place descriptors by aggregating lower-level color, geometry features and higher-level implicit semantic features. However, rather than simple feature concatenation, self-attention modules are employed to select the most important local features that best describe an indoor place. Our AEGIS-Net is made of a semantic encoder, a semantic decoder and an attention-guided feature embedding. The model is trained in a 2-stage process with the first stage focusing on an auxiliary semantic segmentation task and the second one on the place recognition task. We evaluate our AEGIS-Net on the ScanNetPR dataset and compare its performance with a pre-deep-learning feature-based method and five state-of-the-art deep-learning-based methods. Our AEGIS-Net achieves exceptional performance and outperforms all six methods.
PEA-Diffusion: Parameter-Efficient Adapter with Knowledge Distillation in non-English Text-to-Image Generation
Ma, Jian, Chen, Chen, Xie, Qingsong, Lu, Haonan
Text-to-image diffusion models are well-known for their ability to generate realistic images based on textual prompts. However, the existing works have predominantly focused on English, lacking support for non-English text-to-image models. The most commonly used translation methods cannot solve the generation problem related to language culture, while training from scratch on a specific language dataset is prohibitively expensive. In this paper, we are inspired to propose a simple plug-and-play language transfer method based on knowledge distillation. All we need to do is train a lightweight MLP-like parameter-efficient adapter (PEA) with only 6M parameters under teacher knowledge distillation along with a small parallel data corpus. We are surprised to find that freezing the parameters of UNet can still achieve remarkable performance on the language-specific prompt evaluation set, demonstrating that PEA can stimulate the potential generation ability of the original UNet. Additionally, it closely approaches the performance of the English text-to-image model on a general prompt evaluation set. Furthermore, our adapter can be used as a plugin to achieve significant results in downstream tasks in cross-lingual text-to-image generation. Code will be available at: https://github.com/OPPO-Mente-Lab/PEA-Diffusion