Not enough data to create a plot.
Try a different view from the menu above.
Oseledets, Ivan
Federated Privacy-preserving Collaborative Filtering for On-Device Next App Prediction
Sayapin, Albert, Balitskiy, Gleb, Bershatsky, Daniel, Katrutsa, Aleksandr, Frolov, Evgeny, Frolov, Alexey, Oseledets, Ivan, Kharin, Vitaliy
In this study, we propose a novel SeqMF model to solve the problem of predicting the next app launch during mobile device usage. Although this problem can be represented as a classical collaborative filtering problem, it requires proper modification since the data are sequential, the user feedback is distributed among devices and the transmission of users' data to aggregate common patterns must be protected against leakage. According to such requirements, we modify the structure of the classical matrix factorization model and update the training procedure to sequential learning. Since the data about user experience are distributed among devices, the federated learning setup is used to train the proposed sequential matrix factorization model. One more ingredient of the proposed approach is a new privacy mechanism that guarantees the protection of the sent data from the users to the remote server. To demonstrate the efficiency of the proposed model we use publicly available mobile user behavior data. We compare our model with sequential rules and models based on the frequency of app launches. The comparison is conducted in static and dynamic environments. The static environment evaluates how our model processes sequential data compared to competitors. Therefore, the standard train-validation-test evaluation procedure is used. The dynamic environment emulates the real-world scenario, where users generate new data by running apps on devices, and evaluates our model in this case. Our experiments show that the proposed model provides comparable quality with other methods in the static environment. However, more importantly, our method achieves a better privacy-utility trade-off than competitors in the dynamic environment, which provides more accurate simulations of real-world usage.
Machine learning methods for prediction of breakthrough curves in reactive porous media
Fokina, Daria, Toktaliev, Pavel, Iliev, Oleg, Oseledets, Ivan
Reactive flows in porous media play an important role in our life and are crucial for many industrial, environmental and biomedical applications. Very often the concentration of the species at the inlet is known, and the so-called breakthrough curves, measured at the outlet, are the quantities which could be measured or computed numerically. The measurements and the simulations could be time-consuming and expensive, and machine learning and Big Data approaches can help to predict breakthrough curves at lower costs. Machine learning (ML) methods, such as Gaussian processes and fully-connected neural networks, and a tensor method, cross approximation, are well suited for predicting breakthrough curves. In this paper, we demonstrate their performance in the case of pore scale reactive flow in catalytic filters.
Mitigating Human and Computer Opinion Fraud via Contrastive Learning
Tukmacheva, Yuliya, Oseledets, Ivan, Frolov, Evgeny
These platforms collect data about both users' and items' attributes, as well as accumulate the ratings and feedback of products and services, to develop algorithms for significant enhancement of users' experience on the marketplace. These algorithms are capable of influencing the purchasing behavior of users by (1) offering them the selection of the most relevant personalized positions, (2) reducing the individual searching costs, and (3) alleviating the information asymmetry on large commercial platforms with homogeneous sellers and products through feedback mechanisms. Since recommender systems have the power to affect the marketing decisions of users, they have become an attractive target for ratings and reviews manipulations, also known as attacks. Specifically, these attacks are aimed at inflating/deflating the ranks and text reviews of certain product positions or at simply sabotaging the efficiency and credibility of the the commercial platform in general. The current study focuses on solving the task of filtering out the deceptive opinions and detecting anomalous behavior on a platform with text reviews. The emphasis on text reviews can be explained by the fact that texts are a more informative and a more reliable source of product's and seller's quality, than a star-rating system, which is easy to manipulate (see [19], [14], [27], [28]).
FMM-Net: neural network architecture based on the Fast Multipole Method
Sushnikova, Daria, Kharyuk, Pavel, Oseledets, Ivan
In this paper, we propose a new neural network architecture based on the H2 matrix. Even though networks with H2-inspired architecture already exist, and our approach is designed to reduce memory costs and improve performance by taking into account the sparsity template of the H2 matrix. In numerical comparison with alternative neural networks, including the known H2-based ones, our architecture showed itself as beneficial in terms of performance, memory, and scalability.
Tensor-based Sequential Learning via Hankel Matrix Representation for Next Item Recommendations
Frolov, Evgeny, Oseledets, Ivan
Self-attentive transformer models have recently been shown to solve the next item recommendation task very efficiently. The learned attention weights capture sequential dynamics in user behavior and generalize well. Motivated by the special structure of learned parameter space, we question if it is possible to mimic it with an alternative and more lightweight approach. We develop a new tensor factorization-based model that ingrains the structural knowledge about sequential data within the learning process. We demonstrate how certain properties of a self-attention network can be reproduced with our approach based on special Hankel matrix representation. The resulting model has a shallow linear architecture and compares competitively to its neural counterpart.
A case study of spatiotemporal forecasting techniques for weather forecasting
Sofi, Shakir Showkat, Oseledets, Ivan
The majority of real-world processes are spatiotemporal, and the data generated by them exhibits both spatial and temporal evolution. Weather is one of the most important processes that fall under this domain, and forecasting it has become a crucial part of our daily routine. Weather data analysis is considered the most complex and challenging task. Although numerical weather prediction models are currently state-of-the-art, they are resource intensive and time-consuming. Numerous studies have proposed time-series-based models as a viable alternative to numerical forecasts. Recent research has primarily focused on forecasting weather at a specific location. Therefore, models can only capture temporal correlations. This self-contained paper explores various methods for regional data-driven weather forecasting, i.e., forecasting over multiple latitude-longitude points to capture spatiotemporal correlations. The results showed that spatiotemporal prediction models reduced computational cost while improving accuracy; in particular, the proposed tensor train dynamic mode decomposition-based forecasting model has comparable accuracy to ConvLSTM without the need for training. We use the NASA POWER meteorological dataset to evaluate the models and compare them with the current state of the art.
Extension of Dynamic Mode Decomposition for dynamic systems with incomplete information based on t-model of optimal prediction
Katrutsa, Aleksandr, Utyuzhnikov, Sergey, Oseledets, Ivan
The Dynamic Mode Decomposition has proved to be a very efficient technique to study dynamic data. This is entirely a data-driven approach that extracts all necessary information from data snapshots which are commonly supposed to be sampled from measurement. The application of this approach becomes problematic if the available data is incomplete because some dimensions of smaller scale either missing or unmeasured. Such setting occurs very often in modeling complex dynamical systems such as power grids, in particular with reduced-order modeling. To take into account the effect of unresolved variables the optimal prediction approach based on the Mori-Zwanzig formalism can be applied to obtain the most expected prediction under existing uncertainties. This effectively leads to the development of a time-predictive model accounting for the impact of missing data. In the present paper we provide a detailed derivation of the considered method from the Liouville equation and finalize it with the optimization problem that defines the optimal transition operator corresponding to the observed data. In contrast to the existing approach, we consider a first-order approximation of the Mori-Zwanzig decomposition, state the corresponding optimization problem and solve it with the gradient-based optimization method. The gradient of the obtained objective function is computed precisely through the automatic differentiation technique. The numerical experiments illustrate that the considered approach gives practically the same dynamics as the exact Mori-Zwanzig decomposition, but is less computationally intensive.
Smoothed Embeddings for Certified Few-Shot Learning
Pautov, Mikhail, Kuznetsova, Olesya, Tursynbek, Nurislam, Petiushko, Aleksandr, Oseledets, Ivan
Randomized smoothing is considered to be the state-of-the-art provable defense against adversarial perturbations. However, it heavily exploits the fact that classifiers map input objects to class probabilities and do not focus on the ones that learn a metric space in which classification is performed by computing distances to embeddings of classes prototypes. In this work, we extend randomized smoothing to few-shot learning models that map inputs to normalized embeddings. We provide analysis of Lipschitz continuity of such models and derive robustness certificate against $\ell_2$-bounded perturbations that may be useful in few-shot learning scenarios. Our theoretical results are confirmed by experiments on different datasets.
Few-Bit Backward: Quantized Gradients of Activation Functions for Memory Footprint Reduction
Novikov, Georgii, Bershatsky, Daniel, Gusak, Julia, Shonenkov, Alex, Dimitrov, Denis, Oseledets, Ivan
Memory footprint is one of the main limiting factors for large neural network training. In backpropagation, one needs to store the input to each operation in the computational graph. Every modern neural network model has quite a few pointwise nonlinearities in its architecture, and such operation induces additional memory costs which -- as we show -- can be significantly reduced by quantization of the gradients. We propose a systematic approach to compute optimal quantization of the retained gradients of the pointwise nonlinear functions with only a few bits per each element. We show that such approximation can be achieved by computing optimal piecewise-constant approximation of the derivative of the activation function, which can be done by dynamic programming. The drop-in replacements are implemented for all popular nonlinearities and can be used in any existing pipeline. We confirm the memory reduction and the same convergence on several open benchmarks.
Memory-Efficient Backpropagation through Large Linear Layers
Bershatsky, Daniel, Mikhalev, Aleksandr, Katrutsa, Alexandr, Gusak, Julia, Merkulov, Daniil, Oseledets, Ivan
In modern neural networks like Transformers, linear layers require significant memory to store activations during backward pass. This study proposes a memory reduction approach to perform backpropagation through linear layers. Since the gradients of linear layers are computed by matrix multiplications, we consider methods for randomized matrix multiplications and demonstrate that they require less memory with a moderate decrease of the test accuracy. Also, we investigate the variance of the gradient estimate induced by the randomized matrix multiplication. We compare this variance with the variance coming from gradient estimation based on the batch of samples. We demonstrate the benefits of the proposed method on the fine-tuning of the pre-trained RoBERTa model on GLUE tasks.