Lin, Jing
Dynamic Low-Rank Sparse Adaptation for Large Language Models
Huang, Weizhong, Zhang, Yuxin, Zheng, Xiawu, Liu, Yang, Lin, Jing, Yao, Yiwu, Ji, Rongrong
Applying Low-Rank Adaptation (LoRA) to fine-tune the sparse LLMs offers an intuitive approach to counter this predicament, while it holds shortcomings include: 1) The inability to integrate LoRA weights into sparse LLMs post-training, and 2) Insufficient performance recovery at high sparsity ratios. In this paper, we introduce dynamic Low-rank Sparse A daptation (LoSA), a novel method that seamlessly integrates low-rank adaptation into LLM sparsity within a unified framework, thereby enhancing the performance of sparse LLMs without increasing the inference latency. In particular, LoSA dynamically sparsifies the LoRA outcomes based on the corresponding sparse weights during fine-tuning, thus guaranteeing that the LoRA module can be integrated into the sparse LLMs post-training. Besides, LoSA leverages Representation Mutual Information (RMI) as an indicator to determine the importance of layers, thereby efficiently determining the layer-wise sparsity rates during fine-tuning. Predicated on this, LoSA adjusts the rank of the LoRA module based on the variability in layer-wise reconstruction errors, allocating an appropriate fine-tuning for each layer to reduce the output discrepancies between dense and sparse LLMs. Extensive experiments tell that LoSA can efficiently boost the efficacy of sparse LLMs within a few hours, without introducing any additional inferential burden. For example, LoSA reduced the perplexity of sparse LLaMA-2-7B by 68.73 and increased zero-shot accuracy by 16.32%, achieving a 2.60 speedup on CPU and 2.23 speedup on GPU, requiring only 45 minutes of fine-tuning on a single NVIDIA A100 80GB GPU. The development of large language models (LLMs) (Zhang et al., 2022; Touvron et al., 2023a;b) has marked substantial advancements in the field of natural language processing (Achiam et al., 2023). As the scale of these models increases, they demonstrate enhanced capabilities in understanding and generating across diverse contexts (Kaplan et al., 2020; Brown et al., 2020). Nevertheless, the exponential growth in model size presents formidable challenges for deployment and inference, primarily due to escalated computational demands and latency (Zhu et al., 2023). To mitigate these issues, a variety of model compression strategies have been developed. Additionally, LoRA weights cannot be merged into the sparse LLM weights. Moreover, LoSA dynamically determines the layer-wise sparsity rates based on representation mutual information and allocates the ranks of the low-rank adaptation according to the reconstruction errors of the sparse LLM. Among the diverse array of model compression techniques, sparsity emerges as a prominent method for diminishing both the size and computational demands of LLMs (Li et al., 2023b; Lu et al., 2024; Frantar & Alistarh, 2023; Sun et al., 2023).
Nonlinear Transformations Against Unlearnable Datasets
Hapuarachchi, Thushari, Lin, Jing, Xiong, Kaiqi, Rahouti, Mohamed, Ost, Gitte
Automated scraping stands out as a common method for collecting data in deep learning models without the authorization of data owners. Recent studies have begun to tackle the privacy concerns associated with this data collection method. Notable approaches include Deepconfuse, error-minimizing, error-maximizing (also known as adversarial poisoning), Neural Tangent Generalization Attack, synthetic, autoregressive, One-Pixel Shortcut, Self-Ensemble Protection, Entangled Features, Robust Error-Minimizing, Hypocritical, and TensorClog. The data generated by those approaches, called "unlearnable" examples, are prevented "learning" by deep learning models. In this research, we investigate and devise an effective nonlinear transformation framework and conduct extensive experiments to demonstrate that a deep neural network can effectively learn from the data/examples traditionally considered unlearnable produced by the above twelve approaches. The resulting approach improves the ability to break unlearnable data compared to the linear separable technique recently proposed by researchers. Specifically, our extensive experiments show that the improvement ranges from 0.34% to 249.59% for the unlearnable CIFAR10 datasets generated by those twelve data protection approaches, except for One-Pixel Shortcut. Moreover, the proposed framework achieves over 100% improvement of test accuracy for Autoregressive and REM approaches compared to the linear separable technique. Our findings suggest that these approaches are inadequate in preventing unauthorized uses of data in machine learning models. There is an urgent need to develop more robust protection mechanisms that effectively thwart an attacker from accessing data without proper authorization from the owners.
ChatHuman: Language-driven 3D Human Understanding with Retrieval-Augmented Tool Reasoning
Lin, Jing, Feng, Yao, Liu, Weiyang, Black, Michael J.
Numerous methods have been proposed to detect, estimate, and analyze properties of people in images, including the estimation of 3D pose, shape, contact, human-object interaction, emotion, and more. Each of these methods works in isolation instead of synergistically. Here we address this problem and build a language-driven human understanding system -- ChatHuman, which combines and integrates the skills of many different methods. To do so, we finetune a Large Language Model (LLM) to select and use a wide variety of existing tools in response to user inputs. In doing so, ChatHuman is able to combine information from multiple tools to solve problems more accurately than the individual tools themselves and to leverage tool output to improve its ability to reason about humans. The novel features of ChatHuman include leveraging academic publications to guide the application of 3D human-related tools, employing a retrieval-augmented generation model to generate in-context-learning examples for handling new tools, and discriminating and integrating tool results to enhance 3D human understanding. Our experiments show that ChatHuman outperforms existing models in both tool selection accuracy and performance across multiple 3D human-related tasks. ChatHuman is a step towards consolidating diverse methods for human analysis into a single, powerful, system for 3D human reasoning.
PhysHOI: Physics-Based Imitation of Dynamic Human-Object Interaction
Wang, Yinhuai, Lin, Jing, Zeng, Ailing, Luo, Zhengyi, Zhang, Jian, Zhang, Lei
Humans interact with objects all the time. Enabling a humanoid to learn human-object interaction (HOI) is a key step for future smart animation and intelligent robotics systems. However, recent progress in physics-based HOI requires carefully designed task-specific rewards, making the system unscalable and labor-intensive. This work focuses on dynamic HOI imitation: teaching humanoid dynamic interaction skills through imitating kinematic HOI demonstrations. It is quite challenging because of the complexity of the interaction between body parts and objects and the lack of dynamic HOI data. To handle the above issues, we present PhysHOI, the first physics-based whole-body HOI imitation approach without task-specific reward designs. Except for the kinematic HOI representations of humans and objects, we introduce the contact graph to model the contact relations between body parts and objects explicitly. A contact graph reward is also designed, which proved to be critical for precise HOI imitation. Based on the key designs, PhysHOI can imitate diverse HOI tasks simply yet effectively without prior knowledge. To make up for the lack of dynamic HOI scenarios in this area, we introduce the BallPlay dataset that contains eight whole-body basketball skills. We validate PhysHOI on diverse HOI tasks, including whole-body grasping and basketball skills.
Driving behavior-guided battery health monitoring for electric vehicles using machine learning
Jiang, Nanhua, Zhang, Jiawei, Jiang, Weiran, Ren, Yao, Lin, Jing, Khoo, Edwin, Song, Ziyou
An accurate estimation of the state of health (SOH) of batteries is critical to ensuring the safe and reliable operation of electric vehicles (EVs). Feature-based machine learning methods have exhibited enormous potential for rapidly and precisely monitoring battery health status. However, simultaneously using various health indicators (HIs) may weaken estimation performance due to feature redundancy. Furthermore, ignoring real-world driving behaviors can lead to inaccurate estimation results as some features are rarely accessible in practical scenarios. To address these issues, we proposed a feature-based machine learning pipeline for reliable battery health monitoring, enabled by evaluating the acquisition probability of features under real-world driving conditions. We first summarized and analyzed various individual HIs with mechanism-related interpretations, which provide insightful guidance on how these features relate to battery degradation modes. Moreover, all features were carefully evaluated and screened based on estimation accuracy and correlation analysis on three public battery degradation datasets. Finally, the scenario-based feature fusion and acquisition probability-based practicality evaluation method construct a useful tool for feature extraction with consideration of driving behaviors. This work highlights the importance of balancing the performance and practicality of HIs during the development of feature-based battery health monitoring algorithms.
Improving Machine Learning Robustness via Adversarial Training
Dang, Long, Hapuarachchi, Thushari, Xiong, Kaiqi, Lin, Jing
As Machine Learning (ML) is increasingly used in solving various tasks in real-world applications, it is crucial to ensure that ML algorithms are robust to any potential worst-case noises, adversarial attacks, and highly unusual situations when they are designed. Studying ML robustness will significantly help in the design of ML algorithms. In this paper, we investigate ML robustness using adversarial training in centralized and decentralized environments, where ML training and testing are conducted in one or multiple computers. In the centralized environment, we achieve a test accuracy of 65.41% and 83.0% when classifying adversarial examples generated by Fast Gradient Sign Method and DeepFool, respectively. Comparing to existing studies, these results demonstrate an improvement of 18.41% for FGSM and 47% for DeepFool. In the decentralized environment, we study Federated learning (FL) robustness by using adversarial training with independent and identically distributed (IID) and non-IID data, respectively, where CIFAR-10 is used in this research. In the IID data case, our experimental results demonstrate that we can achieve such a robust accuracy that it is comparable to the one obtained in the centralized environment. Moreover, in the non-IID data case, the natural accuracy drops from 66.23% to 57.82%, and the robust accuracy decreases by 25% and 23.4% in C&W and Projected Gradient Descent (PGD) attacks, compared to the IID data case, respectively. We further propose an IID data-sharing approach, which allows for increasing the natural accuracy to 85.04% and the robust accuracy from 57% to 72% in C&W attacks and from 59% to 67% in PGD attacks.
Health diagnosis and recuperation of aged Li-ion batteries with data analytics and equivalent circuit modeling
Made, Riko I, Lin, Jing, Zhang, Jintao, Zhang, Yu, Moh, Lionel C. H., Liu, Zhaolin, Ding, Ning, Chiam, Sing Yang, Khoo, Edwin, Yin, Xuesong, Zheng, Guangyuan Wesley
Battery health assessment and recuperation play a crucial role in the utilization of second-life Li-ion batteries. However, due to ambiguous aging mechanisms and lack of correlations between the recovery effects and operational states, it is challenging to accurately estimate battery health and devise a clear strategy for cell rejuvenation. This paper presents aging and reconditioning experiments of 62 commercial high-energy type lithium iron phosphate (LFP) cells, which supplement existing datasets of high-power LFP cells. The relatively large-scale data allow us to use machine learning models to predict cycle life and identify important indicators of recoverable capacity. Considering cell-to-cell inconsistencies, an average test error of $16.84\% \pm 1.87\%$ (mean absolute percentage error) for cycle life prediction is achieved by gradient boosting regressor given information from the first 80 cycles. In addition, it is found that some of the recoverable lost capacity is attributed to the lateral lithium non-uniformity within the electrodes. An equivalent circuit model is built and experimentally validated to demonstrate how such non-uniformity can be accumulated, and how it can give rise to recoverable capacity loss. SHapley Additive exPlanations (SHAP) analysis also reveals that battery operation history significantly affects the capacity recovery.
Hybrid physics-based and data-driven modeling with calibrated uncertainty for lithium-ion battery degradation diagnosis and prognosis
Lin, Jing, Zhang, Yu, Khoo, Edwin
Advancing lithium-ion batteries (LIBs) in both design and usage is key to promoting electrification in the coming decades to mitigate human-caused climate change. Inadequate understanding of LIB degradation is an important bottleneck that limits battery durability and safety. Here, we propose hybrid physics-based and data-driven modeling for online diagnosis and prognosis of battery degradation. Compared to existing battery modeling efforts, we aim to build a model with physics as its backbone and statistical learning techniques as enhancements. Such a hybrid model has better generalizability and interpretability together with a well-calibrated uncertainty associated with its prediction, rendering it more valuable and relevant to safety-critical applications under realistic usage scenarios.
Active Learning Under Malicious Mislabeling and Poisoning Attacks
Lin, Jing, Luley, Ryan, Xiong, Kaiqi
Deep neural networks usually require large labeled datasets for training to achieve the start-of-the-art performance in many tasks, such as image classification and natural language processing. Though a lot of data is created each day by active Internet users through various distributed systems across the world, most of these data are unlabeled and are vulnerable to data poisoning attacks. In this paper, we develop an efficient active learning method that requires fewer labeled instances and incorporates the technique of adversarial retraining in which additional labeled artificial data are generated without increasing the labeling budget. The generated adversarial examples also provide a way to measure the vulnerability of the model. To check the performance of the proposed method under an adversarial setting, i.e., malicious mislabeling and data poisoning attacks, we perform an extensive evaluation on the reduced CIFAR-10 dataset, which contains only two classes: 'airplane' and 'frog' by using the private cloud on campus. Our experimental results demonstrate that the proposed active learning method is efficient for defending against malicious mislabeling and data poisoning attacks. Specifically, whereas the baseline active learning method based on the random sampling strategy performs poorly (about 50%) under a malicious mislabeling attack, the proposed active learning method can achieve the desired accuracy of 89% using only one-third of the dataset on average.
Impulsive Noise Mitigation in Powerline Communications Using Sparse Bayesian Learning
Lin, Jing, Nassar, Marcel, Evans, Brian L.
Additive asynchronous and cyclostationary impulsive noise limits communication performance in OFDM powerline communication (PLC) systems. Conventional OFDM receivers assume additive white Gaussian noise and hence experience degradation in communication performance in impulsive noise. Alternate designs assume a parametric statistical model of impulsive noise and use the model parameters in mitigating impulsive noise. These receivers require overhead in training and parameter estimation, and degrade due to model and parameter mismatch, especially in highly dynamic environments. In this paper, we model impulsive noise as a sparse vector in the time domain without any other assumptions, and apply sparse Bayesian learning methods for estimation and mitigation without training. We propose three iterative algorithms with different complexity vs. performance trade-offs: (1) we utilize the noise projection onto null and pilot tones to estimate and subtract the noise impulses; (2) we add the information in the data tones to perform joint noise estimation and OFDM detection; (3) we embed our algorithm into a decision feedback structure to further enhance the performance of coded systems. When compared to conventional OFDM PLC receivers, the proposed receivers achieve SNR gains of up to 9 dB in coded and 10 dB in uncoded systems in the presence of impulsive noise.