Goto

Collaborating Authors

 Gao, Zhipeng


Less is More: Adaptive Program Repair with Bug Localization and Preference Learning

arXiv.org Artificial Intelligence

Automated Program Repair (APR) is a task to automatically generate patches for the buggy code. However, most research focuses on generating correct patches while ignoring the consistency between the fixed code and the original buggy code. How to conduct adaptive bug fixing and generate patches with minimal modifications have seldom been investigated. To bridge this gap, we first introduce a novel task, namely AdaPR (Adaptive Program Repair). We then propose a two-stage approach AdaPatcher (Adaptive Patch Generator) to enhance program repair while maintaining the consistency. In the first stage, we utilize a Bug Locator with self-debug learning to accurately pinpoint bug locations. In the second stage, we train a Program Modifier to ensure consistency between the post-modified fixed code and the pre-modified buggy code. The Program Modifier is enhanced with a location-aware repair learning strategy to generate patches based on identified buggy lines, a hybrid training strategy for selective reference and an adaptive preference learning to prioritize fewer changes. The experimental results show that our approach outperforms a set of baselines by a large margin, validating the effectiveness of our two-stage framework for the newly proposed AdaPR task.


MPCODER: Multi-user Personalized Code Generator with Explicit and Implicit Style Representation Learning

arXiv.org Artificial Intelligence

Recent researchers have explored code generation Nowadays, LLMs have been successfully used to task by using LLMs; however, most studies (Li support developers' daily development, such as et al., 2023b, 2022a; Ahmad et al., 2021; Hu et al., code generation, test generation, etc. However, 2021) focus on generating "correct" code. There existing Code LLMs are usually general models is limited research investigating how to generate trained with large programming corpus (Zheng "personalized" code, especially for multi-user personalization, et al., 2023; Chen et al., 2022), therefore the generated with no research conducted yet. Automatically code is difficult to adapt to personalized and/or generating code according to developers' customized requests. Consider the following practical preferences or projects' consistency is a challenging scenarios: Alice is a software developer. To task: (i) Considering different programmers improve programmers' daily efficiency, her company have their own coding styles, it is too expensive provided the base LLMs that can be used for to fine-tune an LLM for each user (Guo et al., code generation.


Blockchain-enabled Trustworthy Federated Unlearning

arXiv.org Artificial Intelligence

Federated unlearning is a promising paradigm for protecting the data ownership of distributed clients. It allows central servers to remove historical data effects within the machine learning model as well as address the "right to be forgotten" issue in federated learning. However, existing works require central servers to retain the historical model parameters from distributed clients, such that allows the central server to utilize these parameters for further training even, after the clients exit the training process. To address this issue, this paper proposes a new blockchain-enabled trustworthy federated unlearning framework. We first design a proof of federated unlearning protocol, which utilizes the Chameleon hash function to verify data removal and eliminate the data contributions stored in other clients' models. Then, an adaptive contribution-based retraining mechanism is developed to reduce the computational overhead and significantly improve the training efficiency. Extensive experiments demonstrate that the proposed framework can achieve a better data removal effect than the state-of-the-art frameworks, marking a significant stride towards trustworthy federated unlearning.


Scalable Federated Unlearning via Isolated and Coded Sharding

arXiv.org Artificial Intelligence

Federated unlearning has emerged as a promising paradigm to erase the client-level data effect without affecting the performance of collaborative learning models. However, the federated unlearning process often introduces extensive storage overhead and consumes substantial computational resources, thus hindering its implementation in practice. To address this issue, this paper proposes a scalable federated unlearning framework based on isolated sharding and coded computing. We first divide distributed clients into multiple isolated shards across stages to reduce the number of clients being affected. Then, to reduce the storage overhead of the central server, we develop a coded computing mechanism by compressing the model parameters across different shards. In addition, we provide the theoretical analysis of time efficiency and storage effectiveness for the isolated and coded sharding. Finally, extensive experiments on two typical learning tasks, i.e., classification and generation, demonstrate that our proposed framework can achieve better performance than three state-of-the-art frameworks in terms of accuracy, retraining time, storage overhead, and F1 scores for resisting membership inference attacks.


FedSup: A Communication-Efficient Federated Learning Fatigue Driving Behaviors Supervision Framework

arXiv.org Artificial Intelligence

With the proliferation of edge smart devices and the Internet of Vehicles (IoV) technologies, intelligent fatigue detection has become one of the most-used methods in our daily driving. To improve the performance of the detection model, a series of techniques have been developed. However, existing work still leaves much to be desired, such as privacy disclosure and communication cost. To address these issues, we propose FedSup, a client-edge-cloud framework for privacy and efficient fatigue detection. Inspired by the federated learning technique, FedSup intelligently utilizes the collaboration between client, edge, and cloud server to realizing dynamic model optimization while protecting edge data privacy. Moreover, to reduce the unnecessary system communication overhead, we further propose a Bayesian convolutional neural network (BCNN) approximation strategy on the clients and an uncertainty weighted aggregation algorithm on the cloud to enhance the central model training efficiency. Extensive experiments demonstrate that the FedSup framework is suitable for IoV scenarios and outperforms other mainstream methods.