AITopics | Krotov, Dmitry

Collaborating Authors

Krotov, Dmitry

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

M+: Extending MemoryLLM with Scalable Long-Term Memory

Wang, Yu, Krotov, Dmitry, Hu, Yuanzhe, Gao, Yifan, Zhou, Wangchunshu, McAuley, Julian, Gutfreund, Dan, Feris, Rogerio, He, Zexue

arXiv.org Artificial IntelligenceFeb-1-2025

Equipping large language models (LLMs) with latent-space memory has attracted increasing attention as they can extend the context window of existing language models. However, retaining information from the distant past remains a challenge. For example, MemoryLLM (Wang et al., 2024a), as a representative work with latent-space memory, compresses past information into hidden states across all layers, forming a memory pool of 1B parameters. While effective for sequence lengths up to 16k tokens, it struggles to retain knowledge beyond 20k tokens. In this work, we address this limitation by introducing M+, a memory-augmented model based on MemoryLLM that significantly enhances long-term information retention. M+ integrates a long-term memory mechanism with a co-trained retriever, dynamically retrieving relevant information during text generation. We evaluate M+ on diverse benchmarks, including long-context understanding and knowledge retention tasks. Experimental results show that M+ significantly outperforms MemoryLLM and recent strong baselines, extending knowledge retention from under 20k to over 160k tokens with similar GPU memory overhead.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2502.00592

Country: Europe > Austria > Vienna (0.14)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

Operator Learning for Reconstructing Flow Fields from Sparse Measurements: an Energy Transformer Approach

Zhang, Qian, Krotov, Dmitry, Karniadakis, George Em

arXiv.org Artificial IntelligenceJan-2-2025

Machine learning methods have shown great success in various scientific areas, including fluid mechanics. However, reconstruction problems, where full velocity fields must be recovered from partial observations, remain challenging. In this paper, we propose a novel operator learning framework for solving reconstruction problems by using the Energy Transformer (ET), an architecture inspired by associative memory models. We formulate reconstruction as a mapping from incomplete observed data to full reconstructed fields. The method is validated on three fluid mechanics examples using diverse types of data: (1) unsteady 2D vortex street in flow past a cylinder using simulation data; (2) high-speed under-expanded impinging supersonic jets impingement using Schlieren imaging; and (3) 3D turbulent jet flow using particle tracking. The results demonstrate the ability of ET to accurately reconstruct complex flow fields from highly incomplete data (90\% missing), even for noisy experimental measurements, with fast training and inference on a single GPU. This work provides a promising new direction for tackling reconstruction problems in fluid mechanics and other areas in mechanics, geophysics, weather prediction, and beyond.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2501.08339

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.66)

Industry:

Health & Medicine > Therapeutic Area (0.68)
Health & Medicine > Epidemiology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Dense Associative Memory Through the Lens of Random Features

Hoover, Benjamin, Chau, Duen Horng, Strobelt, Hendrik, Ram, Parikshit, Krotov, Dmitry

arXiv.org Artificial IntelligenceOct-31-2024

Dense Associative Memories are high storage capacity variants of the Hopfield networks that are capable of storing a large number of memory patterns in the weights of the network of a given size. Their common formulations typically require storing each pattern in a separate set of synaptic weights, which leads to the increase of the number of synaptic weights when new patterns are introduced. In this work we propose an alternative formulation of this class of models using random features, commonly used in kernel methods. In this formulation the number of network's parameters remains fixed. At the same time, new memories can be added to the network by modifying existing weights. We show that this novel network closely approximates the energy function and dynamics of conventional Dense Associative Memories and shares their desirable computational properties.

artificial intelligence, drdam, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2410.24153

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Systems & Languages > Programming Languages (0.82)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.82)

Add feedback

Losing dimensions: Geometric memorization in generative diffusion

Achilli, Beatrice, Ventura, Enrico, Silvestri, Gianluigi, Pham, Bao, Raya, Gabriel, Krotov, Dmitry, Lucibello, Carlo, Ambrogioni, Luca

arXiv.org Machine LearningOct-11-2024

Generative diffusion processes are state-of-the-art machine learning models deeply connected with fundamental concepts in statistical physics. Depending on the dataset size and the capacity of the network, their behavior is known to transition from an associative memory regime to a generalization phase in a phenomenon that has been described as a glassy phase transition. Here, using statistical physics techniques, we extend the theory of memorization in generative diffusion to manifold-supported data. Our theoretical and experimental findings indicate that different tangent subspaces are lost due to memorization effects at different critical times and dataset sizes, which depend on the local variance of the data along their directions. Perhaps counterintuitively, we find that, under some conditions, subspaces of higher variance are lost first due to memorization effects. This leads to a selective loss of dimensionality where some prominent features of the data are memorized without a full collapse on any individual training point. We validate our theory with a comprehensive set of experiments on networks trained both in image datasets and on linear manifolds, which result in a remarkable qualitative agreement with the theoretical predictions.

artificial intelligence, machine learning, singular value, (16 more...)

arXiv.org Machine Learning

2410.08727

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

CAMELoT: Towards Large Language Models with Training-Free Consolidated Associative Memory

He, Zexue, Karlinsky, Leonid, Kim, Donghyun, McAuley, Julian, Krotov, Dmitry, Feris, Rogerio

arXiv.org Artificial IntelligenceFeb-20-2024

Large Language Models (LLMs) struggle to handle long input sequences due to high memory and runtime costs. Memory-augmented models have emerged as a promising solution to this problem, but current methods are hindered by limited memory capacity and require costly re-training to integrate with a new LLM. In this work, we introduce an associative memory module which can be coupled to any pre-trained (frozen) attention-based LLM without re-training, enabling it to handle arbitrarily long input sequences. Unlike previous methods, our associative memory module consolidates representations of individual tokens into a non-parametric distribution model, dynamically managed by properly balancing the novelty and recency of the incoming data. By retrieving information from this consolidated associative memory, the base LLM can achieve significant (up to 29.7% on Arxiv) perplexity reduction in long-context modeling compared to other baselines evaluated on standard benchmarks. This architecture, which we call CAMELoT (Consolidated Associative Memory Enhanced Long Transformer), demonstrates superior performance even with a tiny context window of 128 tokens, and also enables improved in-context learning with a much larger set of demonstrations.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2402.13449

Country: North America > United States (0.46)

Genre:

Research Report > New Finding (0.68)
Research Report > Promising Solution (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Long Sequence Hopfield Memory

Chaudhry, Hamza Tahir, Zavatone-Veth, Jacob A., Krotov, Dmitry, Pehlevan, Cengiz

arXiv.org Machine LearningNov-2-2023

Sequence memory is an essential attribute of natural and artificial intelligence that enables agents to encode, store, and retrieve complex sequences of stimuli and actions. Computational models of sequence memory have been proposed where recurrent Hopfield-like neural networks are trained with temporally asymmetric Hebbian rules. However, these networks suffer from limited sequence capacity (maximal length of the stored sequence) due to interference between the memories. Inspired by recent work on Dense Associative Memories, we expand the sequence capacity of these models by introducing a nonlinear interaction term, enhancing separation between the patterns. We derive novel scaling laws for sequence capacity with respect to network size, significantly outperforming existing scaling laws for models based on traditional Hopfield networks, and verify these theoretical results with numerical simulation. Moreover, we introduce a generalized pseudoinverse rule to recall sequences of highly correlated patterns. Finally, we extend this model to store sequences with variable timing between states' transitions and describe a biologically-plausible implementation, with connections to motor neuroscience.

artificial intelligence, densenet, machine learning, (16 more...)

arXiv.org Machine Learning

2306.04532

Country: North America > United States (0.28)

Genre: Research Report (0.81)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

Add feedback

Energy Transformer

Hoover, Benjamin, Liang, Yuchen, Pham, Bao, Panda, Rameswar, Strobelt, Hendrik, Chau, Duen Horng, Zaki, Mohammed J., Krotov, Dmitry

arXiv.org Machine LearningOct-31-2023

Our work combines aspects of three promising paradigms in machine learning, namely, attention mechanism, energy-based models, and associative memory. Attention is the power-house driving modern deep learning successes, but it lacks clear theoretical foundations. Energy-based models allow a principled approach to discriminative and generative tasks, but the design of the energy functional is not straightforward. At the same time, Dense Associative Memory models or Modern Hopfield Networks have a well-established theoretical foundation, and allow an intuitive design of the energy function. We propose a novel architecture, called the Energy Transformer (or ET for short), that uses a sequence of attention layers that are purposely designed to minimize a specifically engineered energy function, which is responsible for representing the relationships between the tokens. In this work, we introduce the theoretical foundations of ET, explore its empirical capabilities using the image completion task, and obtain strong quantitative results on the graph anomaly detection and graph classification tasks.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Machine Learning

2302.07253

Country: North America > United States > New York (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Add feedback

Memory in Plain Sight: A Survey of the Uncanny Resemblances between Diffusion Models and Associative Memories

Hoover, Benjamin, Strobelt, Hendrik, Krotov, Dmitry, Hoffman, Judy, Kira, Zsolt, Chau, Duen Horng

arXiv.org Artificial IntelligenceSep-28-2023

Diffusion Models (DMs) have recently set state-of-the-art on many generation benchmarks. However, there are myriad ways to describe them mathematically, which makes it difficult to develop a simple understanding of how they work. In this survey, we provide a concise overview of DMs from the perspective of dynamical systems and Ordinary Differential Equations (ODEs) which exposes a mathematical connection to the highly related yet often overlooked class of energy-based models, called Associative Memories (AMs). Energy-based AMs are a theoretical framework that behave much like denoising DMs, but they enable us to directly compute a Lyapunov energy function on which we can perform gradient descent to denoise data. We then summarize the 40 year history of energy-based AMs, beginning with the original Hopfield Network, and discuss new research directions for AMs and DMs that are revealed by characterizing the extent of their similarities and differences

artificial intelligence, diffusion model and associative memory, machine learning, (3 more...)

arXiv.org Artificial Intelligence

2309.1675

Genre: Overview (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Systems & Languages > Programming Languages (0.60)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.60)

Add feedback

End-to-end Differentiable Clustering with Associative Memories

Saha, Bishwajit, Krotov, Dmitry, Zaki, Mohammed J., Ram, Parikshit

arXiv.org Artificial IntelligenceJun-5-2023

Clustering is a widely used unsupervised learning technique involving an intensive discrete optimization problem. Associative Memory models or AMs are differentiable neural networks defining a recursive dynamical system, which have been integrated with various deep learning architectures. We uncover a novel connection between the AM dynamics and the inherent discrete assignment necessary in clustering to propose a novel unconstrained continuous relaxation of the discrete clustering problem, enabling end-to-end differentiable clustering with AM, dubbed ClAM. Leveraging the pattern completion ability of AMs, we further develop a novel self-supervised clustering loss. Our evaluations on varied datasets demonstrate that ClAM benefits from the self-supervision, and significantly improves upon both the traditional Lloyd's k-means algorithm, and more recent continuous clustering relaxations (by upto 60% in terms of the Silhouette Coefficient).

artificial intelligence, machine learning, optimization problem, (18 more...)

arXiv.org Artificial Intelligence

2306.03209

Country: North America > United States > Hawaii (0.14)

Genre: Research Report (0.82)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Sparse Distributed Memory is a Continual Learner

Bricken, Trenton, Davies, Xander, Singh, Deepak, Krotov, Dmitry, Kreiman, Gabriel

arXiv.org Artificial IntelligenceMar-20-2023

Continual learning is a problem for artificial neural networks that their biological counterparts are adept at solving. Building on work using Sparse Distributed Memory (SDM) to connect a core neural circuit with the powerful Transformer model, we create a modified Multi-Layered Perceptron (MLP) that is a strong continual learner. We find that every component of our MLP variant translated from biology is necessary for continual learning. Our solution is also free from any memory replay or task information, and introduces novel methods to train sparse networks that may be broadly applicable.

artificial intelligence, machine learning, neuron, (15 more...)

arXiv.org Artificial Intelligence

2303.11934

Genre: Research Report > Promising Solution (0.34)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback