Oseledets, Ivan
Exploring specialization and sensitivity of convolutional neural networks in the context of simultaneous image augmentations
Kharyuk, Pavel, Matveev, Sergey, Oseledets, Ivan
Drawing parallels with the way biological networks are studied, we adapt the treatment--control paradigm to explainable artificial intelligence research and enrich it through multi-parametric input alterations. In this study, we propose a framework for investigating the internal inference impacted by input data augmentations. The internal changes in network operation are reflected in activation changes measured by variance, which can be decomposed into components related to each augmentation, employing Sobol indices and Shapley values. These quantities enable one to visualize sensitivity to different variables and use them for guided masking of activations. In addition, we introduce a way of single-class sensitivity analysis where the candidates are filtered according to their matching to prediction bias generated by targeted damaging of the activations. Relying on the observed parallels, we assume that the developed framework can potentially be transferred to studying biological neural networks in complex environments.
Combining Flow Matching and Transformers for Efficient Solution of Bayesian Inverse Problems
Sherki, Daniil, Oseledets, Ivan, Muravleva, Ekaterina
Solving Bayesian inverse problems efficiently remains a significant challenge due to the complexity of posterior distributions and the computational cost of traditional sampling methods. Given a series of observations and the forward model, we want to recover the distribution of the parameters, conditioned on observed experimental data. We show, that combining Conditional Flow Mathching (CFM) with transformer-based architecture, we can efficiently sample from such kind of distribution, conditioned on variable number of observations.
LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers
Razzhigaev, Anton, Mikhalchuk, Matvey, Rahmatullaev, Temurbek, Goncharova, Elizaveta, Druzhinina, Polina, Oseledets, Ivan, Kuznetsov, Andrey
We introduce methods to quantify how Large Language Models (LLMs) encode and store contextual information, revealing that tokens often seen as minor (e.g., determiners, punctuation) carry surprisingly high context. Notably, removing these tokens -- especially stopwords, articles, and commas -- consistently degrades performance on MMLU and BABILong-4k, even if removing only irrelevant tokens. Our analysis also shows a strong correlation between contextualization and linearity, where linearity measures how closely the transformation from one layer's embeddings to the next can be approximated by a single linear mapping. These findings underscore the hidden importance of filler tokens in maintaining context. For further exploration, we present LLM-Microscope, an open-source toolkit that assesses token-level nonlinearity, evaluates contextual memory, visualizes intermediate layer contributions (via an adapted Logit Lens), and measures the intrinsic dimensionality of representations. This toolkit illuminates how seemingly trivial tokens can be critical for long-range understanding.
FLAME: Flexible LLM-Assisted Moderation Engine
Bakulin, Ivan, Kopanichuk, Ilia, Bespalov, Iaroslav, Radchenko, Nikita, Shaposhnikov, Vladimir, Dylov, Dmitry, Oseledets, Ivan
The rapid advancement of Large Language Models (LLMs) has introduced significant challenges in moderating user-model interactions. While LLMs demonstrate remarkable capabilities, they remain vulnerable to adversarial attacks, particularly ``jailbreaking'' techniques that bypass content safety measures. Current content moderation systems, which primarily rely on input prompt filtering, have proven insufficient, with techniques like Best-of-N (BoN) jailbreaking achieving success rates of 80% or more against popular LLMs. In this paper, we introduce Flexible LLM-Assisted Moderation Engine (FLAME): a new approach that shifts the focus from input filtering to output moderation. Unlike traditional circuit-breaking methods that analyze user queries, FLAME evaluates model responses, offering several key advantages: (1) computational efficiency in both training and inference, (2) enhanced resistance to BoN jailbreaking attacks, and (3) flexibility in defining and updating safety criteria through customizable topic filtering. Our experiments demonstrate that FLAME significantly outperforms current moderation systems. For example, FLAME reduces attack success rate in GPT-4o-mini and DeepSeek-v3 by a factor of ~9, while maintaining low computational overhead. We provide comprehensive evaluation on various LLMs and analyze the engine's efficiency against the state-of-the-art jailbreaking. This work contributes to the development of more robust and adaptable content moderation systems for LLMs.
Spread them Apart: Towards Robust Watermarking of Generated Content
Pautov, Mikhail, Ivanov, Danil, Galichin, Andrey V., Rogov, Oleg, Oseledets, Ivan
Generative models that can produce realistic images have improved significantly in recent years. The quality of the generated content has increased drastically, so sometimes it is very difficult to distinguish between the real images and the generated ones. Such an improvement comes at a price of ethical concerns about the usage of the generative models: the users of generative models can improperly claim ownership of the generated content protected by a license. In this paper, we propose an approach to embed watermarks into the generated content to allow future detection of the generated content and identification of the user who generated it. The watermark is embedded during the inference of the model, so the proposed approach does not require the retraining of the latter. We prove that watermarks embedded are guaranteed to be robust against additive perturbations of a bounded magnitude. We apply our method to watermark diffusion models and show that it matches state-of-the-art watermarking schemes in terms of robustness to different types of synthetic watermark removal attacks.
MaxInfo: A Training-Free Key-Frame Selection Method Using Maximum Volume for Enhanced Video Understanding
Li, Pengyi, Abdullaeva, Irina, Gambashidze, Alexander, Kuznetsov, Andrey, Oseledets, Ivan
Modern Video Large Language Models (VLLMs) often rely on uniform frame sampling for video understanding, but this approach frequently fails to capture critical information due to frame redundancy and variations in video content. We propose MaxInfo, a training-free method based on the maximum volume principle, which selects and retains the most representative frames from the input video. By maximizing the geometric volume formed by selected embeddings, MaxInfo ensures that the chosen frames cover the most informative regions of the embedding space, effectively reducing redundancy while preserving diversity. This method enhances the quality of input representations and improves long video comprehension performance across benchmarks. For instance, MaxInfo achieves a 3.28% improvement on LongVideoBench and a 6.4% improvement on EgoSchema for LLaVA-Video-7B. It also achieves a 3.47% improvement for LLaVA-Video-72B. The approach is simple to implement and works with existing VLLMs without the need for additional training, making it a practical and effective alternative to traditional uniform sampling methods.
Can message-passing GNN approximate triangular factorizations of sparse matrices?
Trifonov, Vladislav, Muravleva, Ekaterina, Oseledets, Ivan
Specifically, we show that there exist classes of Networks (GNNs) for learning sparse matrix matrices, starting from simple ones such as tridiagonal matrices preconditioners. While recent works have shown arising from discretization of PDEs, where optimal promising results using GNNs to predict incomplete sparse preconditioners exist but exhibit non-local dependencies factorizations, we demonstrate that the local - changing a single entry in A can significantly nature of message passing creates inherent barriers affect all entries in L. This means, that message passing for capturing non-local dependencies required GNNs, having limited receptive field, can not represent such for optimal preconditioning. We introduce a new non-local mappings. To address these limitations, we introduce benchmark dataset of matrices where good sparse a new benchmark dataset of matrices where optimal preconditioners exist but require non-local computations, sparse preconditioners are known to exist but require nonlocal constructed using both synthetic examples computations. We construct this dataset using both and real-world matrices. Our experimental results synthetic examples and real-world matrices from the SuiteSparse show that current GNN architectures struggle to collection. For synthetic benchmarks, we carefully design approximate these preconditioners, suggesting the tridiagonal matrices where the Cholesky factors depend need for new architectural approaches beyond traditional non-locally on the matrix elements by leveraging properties message passing networks. We provide of rank-1 semiseparable matrices. For real-world problems, theoretical analysis and empirical evidence to explain we explicitly compute so-called K-optimal preconditioners these limitations, with implications for the based on the inverse matrix with sparsity patterns matching broader use of GNNs in numerical linear algebra.
CLEAR: Character Unlearning in Textual and Visual Modalities
Dontsov, Alexey, Korzh, Dmitrii, Zhavoronkin, Alexey, Mikheev, Boris, Bobkov, Denis, Alanov, Aibek, Rogov, Oleg Y., Oseledets, Ivan, Tutubalina, Elena
Machine Unlearning (MU) is critical for enhancing privacy and security in deep learning models, particularly in large multimodal language models (MLLMs), by removing specific private or hazardous information. While MU has made significant progress in textual and visual modalities, multimodal unlearning (MMU) remains significantly underexplored, partially due to the absence of a suitable open-source benchmark. To address this, we introduce CLEAR, a new benchmark designed to evaluate MMU methods. CLEAR contains 200 fictitious individuals and 3,700 images linked with corresponding question-answer pairs, enabling a thorough evaluation across modalities. We assess 10 MU methods, adapting them for MMU, and highlight new challenges specific to multimodal forgetting. The dataset is available at https://huggingface.co/datasets/therem/CLEAR
Scalable Cross-Entropy Loss for Sequential Recommendations with Large Item Catalogs
Mezentsev, Gleb, Gusak, Danil, Oseledets, Ivan, Frolov, Evgeny
Scalability issue plays a crucial role in productionizing modern recommender systems. Even lightweight architectures may suffer from high computational overload due to intermediate calculations, limiting their practicality in real-world applications. Specifically, applying full Cross-Entropy (CE) loss often yields state-of-the-art performance in terms of recommendations quality. Still, it suffers from excessive GPU memory utilization when dealing with large item catalogs. This paper introduces a novel Scalable Cross-Entropy (SCE) loss function in the sequential learning setup. It approximates the CE loss for datasets with large-size catalogs, enhancing both time efficiency and memory usage without compromising recommendations quality. Unlike traditional negative sampling methods, our approach utilizes a selective GPU-efficient computation strategy, focusing on the most informative elements of the catalog, particularly those most likely to be false positives. This is achieved by approximating the softmax distribution over a subset of the model outputs through the maximum inner product search. Experimental results on multiple datasets demonstrate the effectiveness of SCE in reducing peak memory usage by a factor of up to 100 compared to the alternatives, retaining or even exceeding their metrics values. The proposed approach also opens new perspectives for large-scale developments in different domains, such as large language models.
Integrating Geodesic Interpolation and Flow Matching for Non-Autoregressive Text Generation in Logit Space
Sevriugov, Egor, Oseledets, Ivan
Non-autoregressive language models are emerging as effective alternatives to autoregressive models in the field of natural language processing, facilitating simultaneous token generation. This study introduces a novel flow matching approach that employs Kullback-Leibler (KL) divergence geodesics to interpolate between initial and target distributions for discrete sequences. We formulate a loss function designed to maximize the conditional likelihood of discrete tokens and demonstrate that its maximizer corresponds to the flow matching velocity during logit interpolation. Although preliminary experiments conducted on the TinyStories dataset yielded suboptimal results, we propose an empirical sampling scheme based on a pretrained denoiser that significantly enhances performance. Additionally, we present a more general hybrid approach that achieves strong performance on more complex datasets, such as Fine Web and Lamini Instruction.