Goto

Collaborating Authors

 aspl


Adaptive Self-improvement LLM Agentic System for ML Library Development

Zhang, Genghan, Liang, Weixin, Hsu, Olivia, Olukotun, Kunle

arXiv.org Artificial Intelligence

ML libraries, often written in architecture-specific programming languages (ASPLs) that target domain-specific architectures, are key to efficient ML systems. However, writing these high-performance ML libraries is challenging because it requires expert knowledge of ML algorithms and the ASPL. Large language models (LLMs), on the other hand, have shown general coding capabilities. However, challenges remain when using LLMs for generating ML libraries using ASPLs because 1) this task is complicated even for experienced human programmers and 2) there are limited code examples because of the esoteric and evolving nature of ASPLs. Therefore, LLMs need complex reasoning with limited data in order to complete this task. To address these challenges, we introduce an adaptive self-improvement agentic system. In order to evaluate the effectiveness of our system, we construct a benchmark of a typical ML library and generate ASPL code with both open and closed-source LLMs on this benchmark. Our results show improvements of up to $3.9\times$ over a baseline single LLM.


Comparative Study on the Performance of Categorical Variable Encoders in Classification and Regression Tasks

Zhu, Wenbin, Qiu, Runwen, Fu, Ying

arXiv.org Artificial Intelligence

Categorical variables often appear in datasets for classification and regression tasks, and they need to be encoded into numerical values before training. Since many encoders have been developed and can significantly impact performance, choosing the appropriate encoder for a task becomes a time-consuming yet important practical issue. This study broadly classifies machine learning models into three categories: 1) ATI models that implicitly perform affine transformations on inputs, such as multi-layer perceptron neural network; 2) Tree-based models that are based on decision trees, such as random forest; and 3) the rest, such as kNN. Theoretically, we prove that the one-hot encoder is the best choice for ATI models in the sense that it can mimic any other encoders by learning suitable weights from the data. We also explain why the target encoder and its variants are the most suitable encoders for tree-based models. This study conducted comprehensive computational experiments to evaluate 14 encoders, including one-hot and target encoders, along with eight common machine-learning models on 28 datasets. The computational results agree with our theoretical analysis. The findings in this study shed light on how to select the suitable encoder for data scientists in fields such as fraud detection, disease diagnosis, etc.


Anti-DreamBooth: Protecting users from personalized text-to-image synthesis

Van Le, Thanh, Phung, Hao, Nguyen, Thuan Hoang, Dao, Quan, Tran, Ngoc, Tran, Anh

arXiv.org Artificial Intelligence

Text-to-image diffusion models are nothing but a revolution, allowing anyone, even without design skills, to create realistic images from simple text inputs. With powerful personalization tools like DreamBooth, they can generate images of a specific person just by learning from his/her few reference images. However, when misused, such a powerful and convenient tool can produce fake news or disturbing content targeting any individual victim, posing a severe negative social impact. In this paper, we explore a defense system called Anti-DreamBooth against such malicious use of DreamBooth. The system aims to add subtle noise perturbation to each user's image before publishing in order to disrupt the generation quality of any DreamBooth model trained on these perturbed images. We investigate a wide range of algorithms for perturbation optimization and extensively evaluate them on two facial datasets over various text-to-image model versions. Despite the complicated formulation of DreamBooth and Diffusion-based text-to-image models, our methods effectively defend users from the malicious use of those models. Their effectiveness withstands even adverse conditions, such as model or prompt/term mismatching between training and testing. Our code will be available at https://github.com/VinAIResearch/Anti-DreamBooth.git.


RGP: Neural Network Pruning through Its Regular Graph Structure

Chen, Zhuangzhi, Xiang, Jingyang, Lu, Yao, Xuan, Qi

arXiv.org Artificial Intelligence

Such process is called the pruning of Lightweight model design has become an important direction the neural network. The pruned neural network can usually in the application of deep learning technology, obtain much faster inference speed than the original model, pruning is an effective mean to achieve a large reduction in which has high significance in actual deployment when the model parameters and FLOPs. The existing neural network efficiency of the model is critical. The pruning of neural network pruning methods mostly start from the importance of parameters, can usually be seen as a three-step pipeline: training and design parameter evaluation metrics to perform the original model, parameter pruning, and fine-tuning the parameter pruning iteratively. These methods are not pruned model. Thus most of these network pruning methods studied from the perspective of model topology, may be effective are data-related, i.e, when the model training is completed, but not efficient, and requires completely different the parameters are pruned according to their values, pruning for different datasets. In this paper, we study the which means that for different datasets, the pruned neural graph structure of the neural network, and propose regular network is different; while some pruning methods perform graph based pruning (RGP) to perform a one-shot neural pruning on the initialized model, such as the lottery ticket network pruning. We generate a regular graph, set the hypothesis [7], but it is still implemented by pruning after node degree value of the graph to meet the pruning ratio, pre-training the original model, it uses the initialized parameters and reduce the average shortest path length of the graph by to reset the pruned model so that an initialized subnetwork swapping the edges to obtain the optimal edge distribution.


GGT: Graph-Guided Testing for Adversarial Sample Detection of Deep Neural Network

Chen, Zuohui, Wang, Renxuan, Xiang, Jingyang, Yu, Yue, Xia, Xin, Ji, Shouling, Xuan, Qi, Yang, Xiaoniu

arXiv.org Artificial Intelligence

Deep Neural Networks (DNN) are known to be vulnerable to adversarial samples, the detection of which is crucial for the wide application of these DNN models. Recently, a number of deep testing methods in software engineering were proposed to find the vulnerability of DNN systems, and one of them, i.e., Model Mutation Testing (MMT), was used to successfully detect various adversarial samples generated by different kinds of adversarial attacks. However, the mutated models in MMT are always huge in number (e.g., over 100 models) and lack diversity (e.g., can be easily circumvented by high-confidence adversarial samples), which makes it less efficient in real applications and less effective in detecting high-confidence adversarial samples. In this study, we propose Graph-Guided Testing (GGT) for adversarial sample detection to overcome these aforementioned challenges. GGT generates pruned models with the guide of graph characteristics, each of them has only about 5% parameters of the mutated model in MMT, and graph guided models have higher diversity. The experiments on CIFAR10 and SVHN validate that GGT performs much better than MMT with respect to both effectiveness and efficiency.