Plotting

 Oseledets, Ivan


Certification of Speaker Recognition Models to Additive Perturbations

arXiv.org Artificial Intelligence

Speaker recognition technology is applied in various tasks ranging from personal virtual assistants to secure access systems. However, the robustness of these systems against adversarial attacks, particularly to additive perturbations, remains a significant challenge. In this paper, we pioneer applying robustness certification techniques to speaker recognition, originally developed for the image domain. In our work, we cover this gap by transferring and improving randomized smoothing certification techniques against norm-bounded additive perturbations for classification and few-shot learning tasks to speaker recognition. We demonstrate the effectiveness of these methods on VoxCeleb 1 and 2 datasets for several models. We expect this work to improve voice-biometry robustness, establish a new certification benchmark, and accelerate research of certification methods in the audio domain.


Quantization of Large Language Models with an Overdetermined Basis

arXiv.org Artificial Intelligence

In this paper, we introduce an algorithm for data quantization based on the principles of Kashin representation. This approach hinges on decomposing any given vector, matrix, or tensor into two factors. The first factor maintains a small infinity norm, while the second exhibits a similarly constrained norm when multiplied by an orthogonal matrix. Surprisingly, the entries of factors after decomposition are well-concentrated around several peaks, which allows us to efficiently replace them with corresponding centroids for quantization purposes. We study the theoretical properties of the proposed approach and rigorously evaluate our compression algorithm in the context of next-word prediction tasks and on a set of downstream tasks for text classification. Our findings demonstrate that Kashin Quantization achieves competitive or superior quality in model performance while ensuring data compression, marking a significant advancement in the field of data quantization.


OmniFusion Technical Report

arXiv.org Artificial Intelligence

In recent years, multimodal architectures emerged as a powerful paradigm for enhancing artificial intelligence (AI) systems, enabling them to process and understand multiple types of data simultaneously [1, 2, 3]. The integration of different data modalities, such as text and images, has significantly improved the capabilities of large language models (LLMs) in various tasks, ranging from visual question answering (VQA) [4] to complex decision-making processes [5, 6]. However, the challenge of effectively coupling various data types remains a significant obstacle in the development of truly integrative AI models. Furthermore, such multimodal multitask architectures are interpreted as the first steps towards the development of the artificial general intelligence (AGI), expanding the number of challenges in world cognition. This work introduces the OmniFusion model, a novel multimodal architecture that leverages the strengths of pretrained LLMs and introduces specialized adapters for processing visual information.


Black-Box Approximation and Optimization with Hierarchical Tucker Decomposition

arXiv.org Artificial Intelligence

Storing such a tensor often requires too much computational effort, and for large values of the dimension d, this is We develop a new method HTBB for the multidimensional completely impossible due to the so-called curse of dimensionality black-box approximation and gradientfree (the memory for storing data and the complexity optimization, which is based on the low-rank of working with it grows exponentially in d). To overcome hierarchical Tucker decomposition with the use it, various compression formats for multidimensional tensors of the MaxVol indices selection procedure. Numerical are proposed: Canonical Polyadic decomposition aka experiments for 14 complex model problems CANDECOMP/PARAFAC (CPD) (Harshman et al., 1970), demonstrate the robustness of the proposed Tucker decomposition (Tucker, 1966), Tensor Train (TT) method for dimensions up to 1000, while it shows decomposition (Oseledets, 2011), Hierarchical Tucker (HT) significantly more accurate results than classical decomposition (Hackbusch & Kühn, 2009; Ballani et al., gradient-free optimization methods, as well as 2013), and their various modifications. These approaches approximation and optimization methods based make it possible to approximately represent the tensor in on the popular tensor train decomposition, which a compact low-rank (i.e., low-parameter) format and then represents a simpler case of a tensor network.


Smart Flow Matching: On The Theory of Flow Matching Algorithms with Applications

arXiv.org Artificial Intelligence

The paper presents the exact formula for the vector field that minimizes the loss for the standard flow. This formula depends analytically on a given distribution \rho_0 and an unknown one \rho_1. Based on the presented formula, a new loss and algorithm for training a vector field model in the style of Conditional Flow Matching are provided. Our loss, in comparison to the standard Conditional Flow Matching approach, exhibits smaller variance when evaluated through Monte Carlo sampling methods. Numerical experiments on synthetic models and models on tabular data of large dimensions demonstrate better learning results with the use of the presented algorithm.


LoTR: Low Tensor Rank Weight Adaptation

arXiv.org Artificial Intelligence

In this paper we generalize and extend an idea of low-rank adaptation (LoRA) of large language models (LLMs) based on Transformer architecture. Widely used LoRA-like methods of fine-tuning LLMs are based on matrix factorization of gradient update. We introduce LoTR, a novel approach for parameter-efficient fine-tuning of LLMs which represents a gradient update to parameters in a form of tensor decomposition. Low-rank adapter for each layer is constructed as a product of three matrices, and tensor structure arises from sharing left and right multipliers of this product among layers. Simultaneous compression of a sequence of layers with low-rank tensor representation allows LoTR to archive even better parameter efficiency then LoRA especially for deep models. Moreover, the core tensor does not depend on original weight dimension and can be made arbitrary small, which allows for extremely cheap and fast downstream fine-tuning.


Sparse and Transferable Universal Singular Vectors Attack

arXiv.org Artificial Intelligence

In recent years, deep learning approaches have become increasingly popular in many areas and applications, starting from computer vision Dosovitskiy et al. [2021a] and natural language processing Touvron et al. [2023], Chung et al. [2022] to robotics Roy et al. [2021] and speech recognition Baevski et al. [2020]. The success and availability of pre-trained neural networks have also made it easier for researchers and developers to use these models for their applications. Despite tremendous advances, it was discovered that deep learning models are vulnerable to small perturbations of input data called adversarial attacks that mislead models and cause incorrect predictions Szegedy et al. [2014], Goodfellow et al. [2014], Moosavi-Dezfooli et al. [2017]. Adversarial attacks as a phenomenon first appeared in the field of computer vision and have raised concerns about the reliability in safety-critical machine learning applications. Initially, adversarial examples were constructed for each individual input Szegedy et al. [2014], making it challenging to scale attacking methods to large datasets. In Moosavi-Dezfooli et al. [2017], the authors show the existence of universal adversarial perturbations (UAPs) that result in the model's misclassification for most of the inputs. Such attacks are crucial for adversarial machine learning research, as they are easier to deploy in real-world applications and raise questions about the safety and robustness of state-of-the-art architectures. However, the proposed optimization algorithm requires vast data, making it complicated to fool real-world systems. In contrast, Khrulkov and Oseledets [2018] proposes a sample-efficient method to construct perturbation using leading (p, q)-singular vectors Boyd [1974] These authors contributed equally to this work.


Probabilistically Robust Watermarking of Neural Networks

arXiv.org Artificial Intelligence

As deep learning (DL) models are widely and effectively used in Machine Learning as a Service (MLaaS) platforms, there is a rapidly growing interest in DL watermarking techniques that can be used to confirm the ownership of a particular model. Unfortunately, these methods usually produce watermarks susceptible to model stealing attacks. In our research, we introduce a novel trigger set-based watermarking approach that demonstrates resilience against functionality stealing attacks, particularly those involving extraction and distillation. Our approach does not require additional model training and can be applied to any model architecture. The key idea of our method is to compute the trigger set, which is transferable between the source model and the set of proxy models with a high probability. In our experimental study, we show that if the probability of the set being transferable is reasonably high, it can be effectively used for ownership verification of the stolen model. We evaluate our method on multiple benchmarks and show that our approach outperforms current state-of-the-art watermarking techniques in all considered experimental setups.


Fast gradient-free activation maximization for neurons in spiking neural networks

arXiv.org Artificial Intelligence

Neural networks (NNs), both living and artificial, work due to being complex systems of neurons, each having its own specialization. Revealing these specializations is important for understanding NNs inner working mechanisms. The only way to do this for a living system, the neural response of which to a stimulus is not a known (let alone differentiable) function is to build a feedback loop of exposing it to stimuli, the properties of which can be iteratively varied aiming in the direction of maximal response. To test such a loop on a living network, one should first learn how to run it quickly and efficiently, reaching most effective stimuli (ones that maximize certain neurons activation) in least possible number of iterations. We present a framework with an effective design of such a loop, successfully testing it on an artificial spiking neural network (SNN, a model that mimics the behaviour of NNs in living brains). Our optimization method used for activation maximization (AM) was based on low-rank tensor decomposition (Tensor Train, TT) of the activation function's discretization over its domain the latent parameter space of stimuli (CIFAR10-size color images, generated by either VQ-VAE or SN-GAN from their latent description vectors, fed to the SNN). To our knowledge, the present work is the first attempt to perform effective AM for SNNs. The source code of our framework, MANGO (for Maximization of neural Activation via Non-Gradient Optimization) is available on GitHub.


Run LoRA Run: Faster and Lighter LoRA Implementations

arXiv.org Artificial Intelligence

LoRA Hu et al. [2022] paper has introduced low-rank adapters to fine-tune large LLMs on downstream tasks. This approach quickly became popular due to reduced cost of the update. Different modifications of LoRA followed, for example, QLoRA Dettmers et al. [2023] utilizes quantization and further reduces fine-tuning costs, and ReLoRA Lialin et al. [2023] which showed that low-rank updates can also be used for full training. However, all variations of LoRA use the same chain of operations while calculating the output, which often leads to sub-optimal graph of computations. We propose RunLora: a framework which contains different variations of forward and backward pass through an adapter-induced linear layer and chooses the best pair for a given architecture. We evaluated our framework's performance on a series of Llama models and achieved up to 17% speedup only due to optimized chain of PyTorch operations. Additionally, we managed to save up to 4Gb of memory due to reduction in number of saved activations.