AITopics | Merkulov, Daniil

Collaborating Authors

Merkulov, Daniil

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Quantization of Large Language Models with an Overdetermined Basis

Merkulov, Daniil, Cherniuk, Daria, Rudikov, Alexander, Oseledets, Ivan, Muravleva, Ekaterina, Mikhalev, Aleksandr, Kashin, Boris

arXiv.org Artificial IntelligenceApr-15-2024

In this paper, we introduce an algorithm for data quantization based on the principles of Kashin representation. This approach hinges on decomposing any given vector, matrix, or tensor into two factors. The first factor maintains a small infinity norm, while the second exhibits a similarly constrained norm when multiplied by an orthogonal matrix. Surprisingly, the entries of factors after decomposition are well-concentrated around several peaks, which allows us to efficiently replace them with corresponding centroids for quantization purposes. We study the theoretical properties of the proposed approach and rigorously evaluate our compression algorithm in the context of next-word prediction tasks and on a set of downstream tasks for text classification. Our findings demonstrate that Kashin Quantization achieves competitive or superior quality in model performance while ensuring data compression, marking a significant advancement in the field of data quantization.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2404.09737

Country: Europe > Russia (0.16)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback

NAG-GS: Semi-Implicit, Accelerated and Robust Stochastic Optimizer

Leplat, Valentin, Merkulov, Daniil, Katrutsa, Aleksandr, Bershatsky, Daniel, Tsymboi, Olga, Oseledets, Ivan

arXiv.org Artificial IntelligenceSep-30-2023

Classical machine learning models such as deep neural networks are usually trained by using Stochastic Gradient Descent-based (SGD) algorithms. The classical SGD can be interpreted as a discretization of the stochastic gradient flow. In this paper we propose a novel, robust and accelerated stochastic optimizer that relies on two key elements: (1) an accelerated Nesterov-like Stochastic Differential Equation (SDE) and (2) its semi-implicit Gauss-Seidel type discretization. The convergence and stability of the obtained method, referred to as NAG-GS, are first studied extensively in the case of the minimization of a quadratic function. This analysis allows us to come up with an optimal learning rate in terms of the convergence rate while ensuring the stability of NAG-GS. This is achieved by the careful analysis of the spectral radius of the iteration matrix and the covariance matrix at stationarity with respect to all hyperparameters of our method. Further, we show that NAG- GS is competitive with state-of-the-art methods such as momentum SGD with weight decay and AdamW for the training of machine learning models such as the logistic regression model, the residual networks models on standard computer vision datasets, Transformers in the frame of the GLUE benchmark and the recent Vision Transformers.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2209.14937

Country:

Europe > Russia (0.14)
Asia (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.54)

Add feedback

Memory-Efficient Backpropagation through Large Linear Layers

Bershatsky, Daniel, Mikhalev, Aleksandr, Katrutsa, Alexandr, Gusak, Julia, Merkulov, Daniil, Oseledets, Ivan

arXiv.org Machine LearningFeb-2-2022

In modern neural networks like Transformers, linear layers require significant memory to store activations during backward pass. This study proposes a memory reduction approach to perform backpropagation through linear layers. Since the gradients of linear layers are computed by matrix multiplications, we consider methods for randomized matrix multiplications and demonstrate that they require less memory with a moderate decrease of the test accuracy. Also, we investigate the variance of the gradient estimate induced by the randomized matrix multiplication. We compare this variance with the variance coming from gradient estimation based on the batch of samples. We demonstrate the benefits of the proposed method on the fine-tuning of the pre-trained RoBERTa model on GLUE tasks.

artificial intelligence, machine learning, variance, (17 more...)

arXiv.org Machine Learning

2201.13195

Country:

Europe > Russia (0.14)
Asia (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Backpropagation (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Empirical study of extreme overfitting points of neural networks

Merkulov, Daniil, Oseledets, Ivan

arXiv.org Machine LearningJun-14-2019

In this paper we propose a method of obtaining points of extreme overfitting - parameters of modern neural networks, at which they demonstrate close to 100 % training accuracy, simultaneously with almost zero accuracy on the test sample. Despite the widespread opinion that the overwhelming majority of critical points of the loss function of a neural network have equally good generalizing ability, such points have a huge generalization error. The paper studies the properties of such points and their location on the surface of the loss function of modern neural networks.

deep learning, loss function, neural network, (18 more...)

arXiv.org Machine Learning

1906.06295

Country: Europe > Russia (0.15)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback