AITopics | Molchanov, Dmitry

Collaborating Authors

Molchanov, Dmitry

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

TEncDM: Understanding the Properties of Diffusion Model in the Space of Language Model Encodings

Shabalin, Alexander, Meshchaninov, Viacheslav, Badmaev, Tingir, Molchanov, Dmitry, Bartosh, Grigory, Markov, Sergey, Vetrov, Dmitry

arXiv.org Artificial IntelligenceFeb-29-2024

Drawing inspiration from the success of diffusion models in various domains, numerous research papers proposed methods for adapting them to text data. Despite these efforts, none of them has managed to achieve the quality of the large language models. In this paper, we conduct a comprehensive analysis of key components of the text diffusion models and introduce a novel approach named Text Encoding Diffusion Model (TEncDM). Instead of the commonly used token embedding space, we train our model in the space of the language model encodings. Additionally, we propose to use a Transformer-based decoder that utilizes contextual information for text reconstruction. We also analyse self-conditioning and find that it increases the magnitude of the model outputs, allowing the reduction of the number of denoising steps at the inference stage. Evaluation of TEncDM on two downstream text generation tasks, QQP and XSum, demonstrates its superiority over existing non-autoregressive models.

diffusion model, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2402.19097

Country:

Europe (0.93)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Star-Shaped Denoising Diffusion Probabilistic Models

Okhotin, Andrey, Molchanov, Dmitry, Arkhipkin, Vladimir, Bartosh, Grigory, Ohanesian, Viktor, Alanov, Aibek, Vetrov, Dmitry

arXiv.org Machine LearningOct-28-2023

Denoising Diffusion Probabilistic Models (DDPMs) provide the foundation for the recent breakthroughs in generative modeling. Their Markovian structure makes it difficult to define DDPMs with distributions other than Gaussian or discrete. In this paper, we introduce Star-Shaped DDPM (SS-DDPM). Its star-shaped diffusion process allows us to bypass the need to define the transition probabilities or compute posteriors. We establish duality between star-shaped and specific Markovian diffusions for the exponential family of distributions and derive efficient algorithms for training and sampling from SS-DDPMs. In the case of Gaussian distributions, SS-DDPM is equivalent to DDPM. However, SS-DDPMs provide a simple recipe for designing diffusion models with distributions such as Beta, von Mises$\unicode{x2013}$Fisher, Dirichlet, Wishart and others, which can be especially useful when data lies on a constrained manifold. We evaluate the model in different settings and find it competitive even on image data, where Beta SS-DDPM achieves results comparable to a Gaussian DDPM. Our implementation is available at https://github.com/andrey-okhotin/star-shaped .

artificial intelligence, ddpm, machine learning, (14 more...)

arXiv.org Machine Learning

2302.05259

Country: Europe > Russia (0.28)

Genre: Research Report (0.64)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.84)

Add feedback

Reintroducing Straight-Through Estimators as Principled Methods for Stochastic Binary Networks

Yanush, Viktor, Shekhovtsov, Alexander, Molchanov, Dmitry, Vetrov, Dmitry

arXiv.org Machine LearningJun-11-2020

Training neural networks with binary weights and activations is a challenging problem due to the lack of gradients and difficulty of optimization over discrete weights. Many successful experimental results have been recently achieved using the empirical straight-through estimation approach. This approach has generated a variety of ad-hoc rules for propagating gradients through non-differentiable activations and updating discrete weights. We put such methods on a solid basis by obtaining them as viable approximations in the stochastic binary network (SBN) model with Bernoulli weights. In this model gradients are well-defined and the weight probabilities can be optimized by continuous techniques. By choosing the activation noises in SBN appropriately and choosing mirror descent (MD) for optimization, we obtain methods that closely resemble several existing straight-through variants, but unlike them, all work reliably and produce equally good results. We further show that variational inference for Bayesian learning of Binary weights can be implemented using MD updates with the same simplicity.

approximation, deep learning, neural network, (17 more...)

arXiv.org Machine Learning

2006.0688

Country:

Europe (0.14)
North America (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.49)

Add feedback

Pitfalls of In-Domain Uncertainty Estimation and Ensembling in Deep Learning

Ashukha, Arsenii, Lyzhov, Alexander, Molchanov, Dmitry, Vetrov, Dmitry

arXiv.org Machine LearningFeb-15-2020

Uncertainty estimation and ensembling methods go hand-in-hand. Uncertainty estimation is one of the main benchmarks for assessment of ensembling performance. At the same time, deep learning ensembles have provided state-of-the-art results in uncertainty estimation. In this work, we focus on in-domain uncertainty for image classification. We explore the standards for its quantification and point out pitfalls of existing metrics. Avoiding these pitfalls, we perform a broad study of different ensembling techniques. To provide more insight in this study, we introduce the deep ensemble equivalent score (DEE) and show that many sophisticated ensembling techniques are equivalent to an ensemble of only few independently trained networks in terms of test performance.

deep learning, ensemble, neural network, (19 more...)

arXiv.org Machine Learning

2002.0647

Country: Europe (0.14)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Doubly Semi-Implicit Variational Inference

Molchanov, Dmitry, Kharitonov, Valery, Sobolev, Artem, Vetrov, Dmitry

arXiv.org Machine LearningMar-16-2019

We extend the existing framework of semi-implicit variational inference (SIVI) and introduce doubly semi-implicit variational inference (DSIVI), a way to perform variational inference and learning when both the approximate posterior and the prior distribution are semi-implicit. In other words, DSIVI performs inference in models where the prior and the posterior can be expressed as an intractable infinite mixture of some analytic density with a highly flexible implicit mixing distribution. We provide a sandwich bound on the evidence lower bound (ELBO) objective that can be made arbitrarily tight. Unlike discriminator-based and kernel-based approaches to implicit variational inference, DSIVI optimizes a proper lower bound on ELBO that is asymptotically exact. We evaluate DSIVI on a set of problems that benefit from implicit priors. In particular, we show that DSIVI gives rise to a simple modification of VampPrior, the current state-of-the-art prior for variational autoencoders, which improves its performance.

deep learning, inference, neural network, (19 more...)

arXiv.org Machine Learning

1810.02789

Country:

Asia (0.28)
North America > United States > New York (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Variational Dropout via Empirical Bayes

Kharitonov, Valery, Molchanov, Dmitry, Vetrov, Dmitry

arXiv.org Machine LearningNov-1-2018

We study the Automatic Relevance Determination procedure applied to deep neural networks. We show that ARD applied to Bayesian DNNs with Gaussian approximate posterior distributions leads to a variational bound similar to that of variational dropout, and in the case of a fixed dropout rate, objectives are exactly the same. Experimental results show that the two approaches yield comparable results in practice even when the dropout rates are trained. This leads to an alternative Bayesian interpretation of dropout and mitigates some of the theoretical issues that arise with the use of improper priors in the variational dropout model. Additionally, we explore the use of the hierarchical priors in ARD and show that it helps achieve higher sparsity for the same accuracy.

deep learning, dropout, neural network, (19 more...)

arXiv.org Machine Learning

1811.00596

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.36)

Add feedback

Uncertainty Estimation via Stochastic Batch Normalization

Atanov, Andrei, Ashukha, Arsenii, Molchanov, Dmitry, Neklyudov, Kirill, Vetrov, Dmitry

arXiv.org Machine LearningMar-20-2018

In this work, we investigate Batch Normalization technique and propose its probabilistic interpretation. We propose a probabilistic model and show that Batch Normalization maximazes the lower bound of its marginalized log-likelihood. Then, according to the new probabilistic model, we design an algorithm which acts consistently during train and test. However, inference becomes computationally inefficient. To reduce memory and computational cost, we propose Stochastic Batch Normalization -- an efficient approximation of proper inference procedure. This method provides us with a scalable uncertainty estimation technique. We demonstrate the performance of Stochastic Batch Normalization on popular architectures (including deep convolutional architectures: VGG-like and ResNets) for MNIST and CIFAR-10 datasets.

artificial intelligence, batch normalization, bayesian inference, (17 more...)

arXiv.org Machine Learning

1802.04893

Country: Oceania > Australia (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.47)

Add feedback

Variance Networks: When Expectation Does Not Meet Your Expectations

Neklyudov, Kirill, Molchanov, Dmitry, Ashukha, Arsenii, Vetrov, Dmitry

arXiv.org Machine LearningMar-20-2018

In this paper, we propose variance networks, a new model that stores the learned information in the variances of the network weights. Surprisingly, no information gets stored in the expectations of the weights, therefore if we replace these weights with their expectations, we would obtain a random guess quality prediction. We provide a numerical criterion that uses the loss curvature to determine which random variables can be replaced with their expected values, and find that only a small fraction of weights is needed for ensembling. Variance networks represent a diverse ensemble that is more robust to adversarial attacks than conventional low-variance ensembles. The success of this model raises several counter-intuitive implications for the training and application of Deep Learning models.

deep learning, neural network, variance network, (17 more...)

arXiv.org Machine Learning

1803.03764

Country: Oceania > Australia (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Bayesian Incremental Learning for Deep Neural Networks

Kochurov, Max, Garipov, Timur, Podoprikhin, Dmitry, Molchanov, Dmitry, Ashukha, Arsenii, Vetrov, Dmitry

arXiv.org Machine LearningMar-14-2018

In industrial machine learning pipelines, data often arrive in parts. Particularly in the case of deep neural networks, it may be too expensive to train the model from scratch each time, so one would rather use a previously learned model and the new data to improve performance. However, deep neural networks are prone to getting stuck in a suboptimal solution when trained on only new data as compared to the full dataset. Our work focuses on a continuous learning setup where the task is always the same and new parts of data arrive sequentially. We apply a Bayesian approach to update the posterior approximation with each new piece of data and find this method to outperform the traditional approach in our experiments.

approximation, deep learning, neural network, (17 more...)

arXiv.org Machine Learning

1802.07329

Country: Europe (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Structured Bayesian Pruning via Log-Normal Multiplicative Noise

Neklyudov, Kirill, Molchanov, Dmitry, Ashukha, Arsenii, Vetrov, Dmitry P.

Neural Information Processing SystemsDec-31-2017

Dropout-based regularization methods can be regarded as injecting random noise with pre-defined magnitude to different parts of the neural network during training. It was recently shown that Bayesian dropout procedure not only improves gener- alization but also leads to extremely sparse neural architectures by automatically setting the individual noise magnitude per weight. However, this sparsity can hardly be used for acceleration since it is unstructured. In the paper, we propose a new Bayesian model that takes into account the computational structure of neural net- works and provides structured sparsity, e.g. removes neurons and/or convolutional channels in CNNs. To do this we inject noise to the neurons outputs while keeping the weights unregularized. We establish the probabilistic model with a proper truncated log-uniform prior over the noise and truncated log-normal variational approximation that ensures that the KL-term in the evidence lower bound is com- puted in closed-form. The model leads to structured sparsity by removing elements with a low SNR from the computation graph and provides significant acceleration on a number of deep neural architectures. The model is easy to implement as it can be formulated as a separate dropout-like layer.

bayesian inference, deep learning, neural network, (17 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Europe (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback