Goto

Collaborating Authors

 transfer rate


Efficient Ensemble for Fine-tuning Language Models on Multiple Datasets

arXiv.org Artificial Intelligence

This paper develops an ensemble method for fine-tuning a language model to multiple datasets. Existing methods, such as quantized LoRA (QLoRA), are efficient when adapting to a single dataset. When training on multiple datasets of different tasks, a common setup in practice, it remains unclear how to design an efficient adaptation for fine-tuning language models. We propose to use an ensemble of multiple smaller adapters instead of a single adapter per task. We design an efficient algorithm that partitions $n$ datasets into $m$ groups, where $m$ is typically much smaller than $n$ in practice, and train one adapter for each group before taking a weighted combination to form the ensemble. The algorithm leverages a first-order approximation property of low-rank adaptation to quickly obtain the fine-tuning performances of dataset combinations since methods like LoRA stay close to the base model. Hence, we use the gradients of the base model to estimate its behavior during fine-tuning. Empirically, this approximation holds with less than $1\%$ error on models with up to $34$ billion parameters, leading to an estimation of true fine-tuning performances under $5\%$ error while speeding up computation compared to base fine-tuning by $105$ times. When applied to fine-tune Llama and GPT models on ten text classification tasks, our approach provides up to $10\%$ higher average test accuracy over QLoRA, with only $9\%$ more FLOPs. On a Llama model with $34$ billion parameters, an ensemble of QLoRA increases test accuracy by $3\%$ compared to QLoRA, with only $8\%$ more FLOPs.


Open Science and Artificial Intelligence for supporting the sustainability of the SRC Network: The espSRC case

arXiv.org Artificial Intelligence

The SKA Observatory (SKAO), a landmark project in radio astronomy, seeks to address fundamental questions in astronomy. To process its immense data output, approximately 700 PB/year, a global network of SKA Regional Centres (SR-CNet) will provide the infrastructure, tools, computational power needed for scientific analysis and scientific support. The Spanish SRC (espSRC) focuses on ensuring the sustainability of this network by reducing its environmental impact, integrating green practices into data platforms, and developing Open Science technologies to enable reproducible research. This paper discusses and summarizes part of the research and development activities that the team is conducting to reduce the SRC energy consumption at the espSRC and SRCNet. The paper also discusses fundamental research on trusted repositories to support Open Science practices.


Understanding Adversarial Transferability in Federated Learning

arXiv.org Artificial Intelligence

We investigate the robustness and security issues from a novel and practical setting: a group of malicious clients has impacted the model during training by disguising their identities and acting as benign clients, and only revealing their adversary position after the training to conduct transferable adversarial attacks with their data, which is usually a subset of the data that FL system is trained with. Our aim is to offer a full understanding of the challenges the FL system faces in this practical setting across a spectrum of configurations. We notice that such an attack is possible, but the federated model is more robust compared with its centralized counterpart when the accuracy on clean images is comparable. Through our study, we hypothesized the robustness is from two factors: the decentralized training on distributed data and the averaging operation. We provide evidence from both the perspective of empirical experiments and theoretical analysis. Our work has implications for understanding the robustness of federated learning systems and poses a practical question for federated learning applications.


Improving Transfer Rates in Brain Computer Interfacing: A Case Study

Neural Information Processing Systems

We adopted an approach of Farwell & Donchin [4], which we tried to improve in several aspects. The main objective was to improve the trans- fer rates based on offline analysis of EEG-data but within a more realistic setup closer to an online realization than in the original studies. The ob- jective was achieved along two different tracks: on the one hand we used state-of-the-art machine learning techniques for signal classification and on the other hand we augmented the data space by using more electrodes for the interface. For the classification task we utilized SVMs and, as mo- tivated by recent findings on the learning of discriminative densities, we accumulated the values of the classification function in order to combine several classifications, which finally lead to significantly improved rates as compared with techniques applied in the original work. In combina- tion with the data space augmentation, we achieved competitive transfer rates at an average of 50.5 bits/min and with a maximum of 84.7 bits/min.


Generalizing Adversarial Examples by AdaBelief Optimizer

arXiv.org Artificial Intelligence

Recent research has proved that deep neural networks (DNNs) are vulnerable to adversarial examples, the legitimate input added with imperceptible and well-designed perturbations can fool DNNs easily in the testing stage. However, most of the existing adversarial attacks are difficult to fool adversarially trained models. To solve this issue, we propose an AdaBelief iterative Fast Gradient Sign Method (AB-FGSM) to generalize adversarial examples. By integrating AdaBelief optimization algorithm to I-FGSM, we believe that the generalization of adversarial examples will be improved, relying on the strong generalization of AdaBelief optimizer. To validate the effectiveness and transferability of adversarial examples generated by our proposed AB-FGSM, we conduct the white-box and black-box attacks on various single models and ensemble models. Compared with state-of-the-art attack methods, our proposed method can generate adversarial examples effectively in the white-box setting, and the transfer rate is 7%-21% higher than latest attack methods.


Efficient and Transferable Adversarial Examples from Bayesian Neural Networks

arXiv.org Machine Learning

Deep neural networks are vulnerable to evasion attacks, i.e., carefully crafted examples designed to fool a model at test time. Attacks that successfully evade an ensemble of models can transfer to other independently trained models, which proves useful in black-box settings. Unfortunately, these methods involve heavy computation costs to train the models forming the ensemble. To overcome this, we propose a new method to generate transferable adversarial examples efficiently. Inspired by Bayesian deep learning, our method builds such ensembles by sampling from the posterior distribution of neural network weights during a single training process. Experiments on CIFAR-10 show that our approach improves the transfer rates significantly at equal or even lower computation costs. Intra-architecture transfer rate is increased by 23% compared to classical ensemble-based attacks, while requiring 4 times less training epochs. In the inter-architecture case, we show that we can combine our method with ensemble-based attacks to increase their transfer rate by up to 15% with constant training computational cost.


Transfer learning for nonlinear dynamics and its application to fluid turbulence

arXiv.org Machine Learning

We introduce transfer learning for nonlinear dynamics, which enables efficient predictions of chaotic dynamics by utilizing a small amount of data. For the Lorenz chaos, by optimizing the transfer rate, we accomplish more accurate inference than the conventional method by an order of magnitude. Moreover, a surprisingly small amount of learning is enough to infer the energy dissipation rate of the Navier-Stokes turbulence because we can, thanks to the small-scale universality of turbulence, transfer a large amount of the knowledge learned from turbulence data at lower Reynolds number.


The neural circuitry of affect-induced distortions of trust

#artificialintelligence

Aversive affect is likely a key source of irrational human decision-making, but still, little is known about the neural circuitry underlying emotion-cognition interactions during social behavior. We induced incidental aversive affect via prolonged periods of threat of shock, while 41 healthy participants made investment decisions concerning another person or a lottery. Negative affect reduced trust, suppressed trust-specific activity in the left temporoparietal junction (TPJ), and reduced functional connectivity between the TPJ and emotion-related regions such as the amygdala. The posterior superior temporal sulcus (pSTS) seems to play a key role in mediating the impact of affect on behavior: Functional connectivity of this brain area with left TPJ was associated with trust in the absence of negative affect, but aversive affect disrupted this association between TPJ-pSTS connectivity and behavioral trust. Our findings may be useful for a better understanding of the neural circuitry of ...


Adversarial Examples: Opportunities and Challenges

arXiv.org Machine Learning

Abstract--With the advent of the era of artificial intelligence (AI), deep neural networks (DNNs) have shown huge superiority over human in image recognition, speech processing, autonomous vehicles and medical diagnosis. However, recent studies indicate that DNNs are vulnerable to adversarial examples (AEs) which are designed by attackers to fool deep learning models. Different from real examples, AEs can hardly be distinguished from human eyes, but mislead the model to predict incorrect outputs and therefore threaten security critical deep-learning applications. In recent years, the generation and defense of AEs have become a research hotspot in the field of AI security. This article reviews the latest research progress of AEs. First, we introduce the concept, cause, characteristic and evaluation metrics of AEs, then give a survey on the state-of-the-art AE generation methods with the discussion of advantages and disadvantages. After that we review the existing defenses and discuss their limitations. Finally, the future research opportunities and challenges of AEs are prospected. In the era of AI, DNNs have shown great advantages in autonomous vehicles, robotics, network security, image/speech recognition and natural language processing (NLP). For example, in 2017, an intelligent robot with the superior face recognition ability, named XiaoDu developed by Baidu, defeated a representative from the team of humans strongest brain with the score of 3:2 [1]. On October 19th, 2017, the DeepMind team of Google released the AlphaGo Zero, which shocked the world. Compared with the previous AlphaGo, AlphaGo Zero relies on reinforcement learning without any priori knowledge to grow chess skills and finally beats every human competitor [2]. For AI research, the United States received huge support from the government, such as the Federal Research Fund. In October 2016, the United States issued the project of Preparing for the Future of Artificial Intelligence and the National Artificial Intelligence Research and Development Strategic Plan, which raised AI to the national strategic level and formulated ambitious blueprints [3], [4]. Manuscript received xxx; revised xx; accepted xxx. This work is supported by the National Natural Science Foundation of China (Grant NOs. J. Zhang and X. Jiang are with the College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China (email: zhangjiliang@hnu.edu.cn). In the same year, AI was written into the nineteenth National Congress report, which pushed the development of AI industries to a new height and filled the gap in the top-level strategy of AI development [5].


Siamese networks for generating adversarial examples

arXiv.org Machine Learning

Machine learning models are vulnerable to adversarial examples. An adversary modifies the input data such that humans still assign the same label, however, machine learning models misclassify it. Previous approaches in the literature demonstrated that adversarial examples can even be generated for the remotely hosted model. In this paper, we propose a Siamese network based approach to generate adversarial examples for a multiclass target CNN. We assume that the adversary do not possess any knowledge of the target data distribution, and we use an unlabeled mismatched dataset to query the target, e.g., for the ResNet-50 target, we use the Food-101 dataset as the query. Initially, the target model assigns labels to the query dataset, and a Siamese network is trained on the image pairs derived from these multiclass labels. We learn the \emph{adversarial perturbations} for the Siamese model and show that these perturbations are also adversarial w.r.t. the target model. In experimental results, we demonstrate effectiveness of our approach on MNIST, CIFAR-10 and ImageNet targets with TinyImageNet/Food-101 query datasets.