AITopics | memory vector

Collaborating Authors

memory vector

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

DiversifiedRecommendationsforAgentswith AdaptivePreferences

Neural Information Processing SystemsFeb-11-2026, 04:54:28 GMT

We show that these conditions are closely connected: all sufficiently high-entropy distributions are instantaneously realizable at anyhistory of selected items.

artificial intelligence, corr, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Alameda County > Berkeley (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.31)

Add feedback

Steering Information Utility in Key-Value Memory for Language Model Post-Training

Deng, Chunyuan, Chang, Ruidi, Chen, Hanjie

arXiv.org Artificial IntelligenceOct-30-2025

Recent advancements in language models (LMs) have marked a shift toward the growing importance of post-training. Yet, post-training approaches such as supervised fine-tuning (SFT) do not guarantee the effective use of knowledge acquired during pretraining. We therefore introduce InfoSteer, a lightweight method that encourages parametric information utilization in LMs during post-training. Specifically, InfoSteer treats the feed-forward network (FFN) layer as associate key-value memory and promotes the use of stored memory vectors via forward-pass interventions or regularization during backpropagation. This simple guidance during post-training phase yields consistent performance improvements across diverse model families -- including Qwen, Gemma and Llama -- spanning 15 downstream tasks in both in-distribution (ID) and out-of-distribution (OOD) evaluations. Beyond performance gains, we also find that steered LMs can adaptively allocate information by placing more emphasis on generating semantically meaningful tokens, while using fewer resources on simple transition ones (e.g., `\texttt{,}' or `\texttt{and}'). Our work underscores that vanilla post-training does not fully exploit the potential gained during pre-training, and that steering LMs in latent representation space offers a promising approach to enhance both performance and interpretability. The code is available at: https://github.com/chili-lab/InfoSteer.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2507.05158

Country:

Asia > Middle East (0.67)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

A Further Related Work

Neural Information Processing SystemsAug-17-2025, 10:40:52 GMT

The "dueling bandits" problem, initially proposed as a model for similar recommendation systems A number of works in recent years explore online problems where an agent responds to the decision-maker's actions, influencing their reward. The "revealed preferences" literature involves a similar requirement of learning a mapping Some recent work has begun to explore the problem of designing optimal strategies in a repeated game against agents who adapt their strategies over time using a no-regret algorithm. As such, the empirical probability of b must be close to 1 /2. We make use of a lemma from [2], which we restate here. Lemma 8. Consider two vectors We prove local learnability results for each case.

probability, query, vector, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.48)

Add feedback

Diversified Recommendations for Agents with Adaptive Preferences

Neural Information Processing SystemsAug-17-2025, 10:40:48 GMT

The Recommender then observes the Agent's selected item

algorithm, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Alameda County > Berkeley (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.68)

Add feedback

Knowledge Entropy Decay during Language Model Pretraining Hinders New Knowledge Acquisition

Kim, Jiyeon, Lee, Hyunji, Cho, Hyowon, Jang, Joel, Hwang, Hyeonbin, Won, Seungpil, Ahn, Youbin, Lee, Dohaeng, Seo, Minjoon

arXiv.org Artificial IntelligenceDec-2-2024

In this work, we investigate how a model's tendency to broadly integrate its parametric knowledge evolves throughout pretraining, and how this behavior affects overall performance, particularly in terms of knowledge acquisition and forgetting. We introduce the concept of knowledge entropy, which quantifies the range of memory sources the model engages with; high knowledge entropy indicates that the model utilizes a wide range of memory sources, while low knowledge entropy suggests reliance on specific sources with greater certainty. Our analysis reveals a consistent decline in knowledge entropy as pretraining advances. We also find that the decline is closely associated with a reduction in the model's ability to acquire and retain knowledge, leading us to conclude that diminishing knowledge entropy (smaller number of active memory sources) impairs the model's knowledge acquisition and retention capabilities. We find further support for this by demonstrating that increasing the activity of inactive memory sources enhances the model's capacity for knowledge acquisition and retention.

knowledge, knowledge acquisition, knowledge entropy, (13 more...)

arXiv.org Artificial Intelligence

2410.0138

Country:

North America > Dominican Republic (0.04)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
South America > Colombia > Meta Department > Villavicencio (0.04)
(7 more...)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Knowledge Management > Knowledge Engineering (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Brain Inspired Probabilistic Occupancy Grid Mapping with Hyperdimensional Computing

Snyder, Shay, Capodieci, Andrew, Gorsich, David, Parsa, Maryam

arXiv.org Artificial IntelligenceAug-27-2024

Real-time robotic systems require advanced perception, computation, and action capability. However, the main bottleneck in current autonomous systems is the trade-off between computational capability, energy efficiency and model determinism. World modeling, a key objective of many robotic systems, commonly uses occupancy grid mapping (OGM) as the first step towards building an end-to-end robotic system with perception, planning, autonomous maneuvering, and decision making capabilities. OGM divides the environment into discrete cells and assigns probability values to attributes such as occupancy and traversability. Existing methods fall into two categories: traditional methods and neural methods. Traditional methods rely on dense statistical calculations, while neural methods employ deep learning for probabilistic information processing. Recent works formulate a deterministic theory of neural computation at the intersection of cognitive science and vector symbolic architectures. In this study, we propose a Fourier-based hyperdimensional OGM system, VSA-OGM, combined with a novel application of Shannon entropy that retains the interpretability and stability of traditional methods along with the improved computational efficiency of neural methods. Our approach, validated across multiple datasets, achieves similar accuracy to covariant traditional methods while approximately reducing latency by 200x and memory by 1000x. Compared to invariant traditional methods, we see similar accuracy values while reducing latency by 3.7x. Moreover, we achieve 1.5x latency reductions compared to neural methods while eliminating the need for domain-specific model training.

dataset, dimensionality, vector, (17 more...)

arXiv.org Artificial Intelligence

2408.09066

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Michigan > Macomb County > Warren (0.04)
Europe (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
(2 more...)

Add feedback

An Efficient Replay for Class-Incremental Learning with Pre-trained Models

Yin, Weimin, Xie, Bin Chen adn Chunzhao, Tan, Zhenhao

arXiv.org Artificial IntelligenceAug-15-2024

In general class-incremental learning, researchers typically use sample sets as a tool to avoid catastrophic forgetting during continuous learning. At the same time, researchers have also noted the differences between class-incremental learning and Oracle training and have attempted to make corrections. In recent years, researchers have begun to develop class-incremental learning algorithms utilizing pre-trained models, achieving significant results. This paper observes that in class-incremental learning, the steady state among the weight guided by each class center is disrupted, which is significantly correlated with catastrophic forgetting. Based on this, we propose a new method to overcoming forgetting . In some cases, by retaining only a single sample unit of each class in memory for replay and applying simple gradient constraints, very good results can be achieved. Experimental results indicate that under the condition of pre-trained models, our method can achieve competitive performance with very low computational cost and by simply using the cross-entropy loss.

continual learning, knowledge, learning, (11 more...)

arXiv.org Artificial Intelligence

2408.08084

Genre: Research Report > Promising Solution (0.46)

Industry: Education > Educational Setting > Continuing Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Learning Random Numbers to Realize Appendable Memory System for Artificial Intelligence to Acquire New Knowledge after Deployment

Yamada, Kazunori D

arXiv.org Artificial IntelligenceJul-29-2024

In this study, we developed a learning method for constructing a neural network system capable of memorizing data and recalling it without parameter updates. The system we built using this method is called the Appendable Memory system. The Appendable Memory system enables an artificial intelligence (AI) to acquire new knowledge even after deployment. It consists of two AIs: the Memorizer and the Recaller. This system is a key-value store built using neural networks. The Memorizer receives data and stores it in the Appendable Memory vector, which is dynamically updated when the AI acquires new knowledge. Meanwhile, the Recaller retrieves information from the Appendable Memory vector. What we want to teach AI in this study are the operations of memorizing and recalling information. However, traditional machine learning methods make AI learn features inherent in the learning dataset. We demonstrate that the systems we intend to create cannot be realized by current machine learning methods, that is, by merely repeating the input and output learning sequences with AI. Instead, we propose a method to teach AI to learn operations, by completely removing the features contained in the learning dataset. Specifically, we probabilized all the data involved in learning. This measure prevented AI from learning the features of the data. The learning method proposed in the study differs from traditional machine learning methods and provides fundamental approaches for building an AI system that can store information in a finite memory and recall it at a later date.

information, memorizer-recaller network, memory vector, (11 more...)

arXiv.org Artificial Intelligence

2407.20197

Country: Asia > Japan > Honshū > Tōhoku > Miyagi Prefecture > Sendai (0.04)

Genre: Research Report > New Finding (0.69)

Industry: Education > Educational Setting (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

SLADE: Detecting Dynamic Anomalies in Edge Streams without Labels via Self-Supervised Learning

Lee, Jongha, Kim, Sunwoo, Shin, Kijung

arXiv.org Artificial IntelligenceJul-24-2024

To detect anomalies in real-world graphs, such as social, email, and financial networks, various approaches have been developed. While they typically assume static input graphs, most real-world graphs grow over time, naturally represented as edge streams. In this context, we aim to achieve three goals: (a) instantly detecting anomalies as they occur, (b) adapting to dynamically changing states, and (c) handling the scarcity of dynamic anomaly labels. In this paper, we propose SLADE (Self-supervised Learning for Anomaly Detection in Edge Streams) for rapid detection of dynamic anomalies in edge streams, without relying on labels. SLADE detects the shifts of nodes into abnormal states by observing deviations in their interaction patterns over time. To this end, it trains a deep neural network to perform two self-supervised tasks: (a) minimizing drift in node representations and (b) generating long-term interaction patterns from short-term ones. Failure in these tasks for a node signals its deviation from the norm. Notably, the neural network and tasks are carefully designed so that all required operations can be performed in constant time (w.r.t. the graph size) in response to each new edge in the input stream. In dynamic anomaly detection across four real-world datasets, SLADE outperforms nine competing methods, even those leveraging label supervision.

dataset, node, slade, (13 more...)

arXiv.org Artificial Intelligence

2402.11933

Country:

Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.05)
Asia > South Korea > Seoul > Seoul (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report (0.82)

Industry: Banking & Finance (0.48)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Memory-guided Network with Uncertainty-based Feature Augmentation for Few-shot Semantic Segmentation

Chen, Xinyue, Shi, Miaojing

arXiv.org Artificial IntelligenceJun-9-2024

The performance of supervised semantic segmentation methods highly relies on the availability of large-scale training data. To alleviate this dependence, few-shot semantic segmentation (FSS) is introduced to leverage the model trained on base classes with sufficient data into the segmentation of novel classes with few data. FSS methods face the challenge of model generalization on novel classes due to the distribution shift between base and novel classes. To overcome this issue, we propose a class-shared memory (CSM) module consisting of a set of learnable memory vectors. These memory vectors learn elemental object patterns from base classes during training whilst re-encoding query features during both training and inference, thereby improving the distribution alignment between base and novel classes. Furthermore, to cope with the performance degradation resulting from the intra-class variance across images, we introduce an uncertainty-based feature augmentation (UFA) module to produce diverse query features during training for improving the model's robustness. We integrate CSM and UFA into representative FSS works, with experimental results on the widely-used PASCAL-5$^i$ and COCO-20$^i$ datasets demonstrating the superior performance of ours over state of the art.

feature statistics, query feature, segmentation, (14 more...)

arXiv.org Artificial Intelligence

2406.00545

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (0.94)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.59)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback