AITopics | event memory

Predicting Event Memorability from Contextual Visual Semantics

Neural Information Processing SystemsApr-26-2026, 20:53:06 GMT

Episodic event memory is a key component of human cognition. Predicting event memorability, i.e., to what extent an event is recalled, is a tough challenge in memory research and has profound implications for artificial intelligence. In this study, we investigate factors that affect event memorability according to a cued recall process. Specifically, we explore whether event memorability is contingent on the event context, as well as the intrinsic visual attributes of image cues. We design a novel experiment protocol and conduct a large-scale experiment with 47 elder subjects over 3 months. Subjects' memory of life events is tested in a cued recall process. Using advanced visual analytics methods, we build a first-ofits-kind event memorability dataset (called R3) with rich information about event context and visual semantic features. Furthermore, we propose a contextual event memory network (CEMNet) that tackles multi-modal input to predict item-wise event memorability, which outperforms competitive benchmarks. The findings inform deeper understanding of episodic event memory, and open up a new avenue for prediction of human episodic memory.

artificial intelligence, deep learning, machine learning, (19 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Glance and Focus: Memory Prompting for Multi-Event Video Question Answering Supplementary Material Ziyi Bai, Ruiping Wang, Xilin Chen ziyi.bai@vipl.ict.ac.cn, {wangruiping, xlchen }@ict.ac.cn

Neural Information Processing SystemsFeb-13-2026, 14:35:46 GMT

As mentioned in Section 4.2 Our model can easily adapt to various video backbones. We use QA accuracy as the metric for evaluation. As illustrated in Section 3.2, with event-level annotations, we First, we analyze the effects of different loss functions on model performance. The results are shown in Figure 1. When the coefficient of any loss function is 0, the performance of the model decreases, which indicates their efficiency in event memory extraction. Without it, there is a significant decrease in model performance.

machine learning, natural language, question answering, (15 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.97)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.43)

Add feedback

Glance and Focus: Memory Prompting for Multi-Event Video Question Answering Ziyi Bai

Neural Information Processing SystemsFeb-13-2026, 14:35:42 GMT

Video Question Answering (VideoQA) has emerged as a vital tool to evaluate agents' ability to understand human daily behaviors. Despite the recent success of large vision language models in many multi-modal tasks, complex situation reasoning over videos involving multiple human-object interaction events still remains challenging.

computer vision, machine learning, question answering, (14 more...)

Neural Information Processing Systems

Country: Asia > China > Beijing > Beijing (0.04)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Glance and Focus: Memory Prompting for Multi-Event Video Question Answering

Neural Information Processing SystemsDec-25-2025, 22:03:37 GMT

Video Question Answering (VideoQA) has emerged as a vital tool to evaluate agents' ability to understand human daily behaviors. Despite the recent success of large vision language models in many multi-modal tasks, complex situation reasoning over videos involving multiple human-object interaction events still remains challenging. In contrast, humans can easily tackle it by using a series of episode memories as anchors to quickly locate question-related key moments for reasoning. To mimic this effective reasoning strategy, we propose the Glance-Focus model. One simple way is to apply an action detection model to predict a set of actions as key memories.

glance and focus, memory prompting, name change, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language (0.61)

Add feedback

Predicting Event Memorability from Contextual Visual Semantics

Neural Information Processing SystemsDec-24-2025, 20:26:32 GMT

Episodic event memory is a key component of human cognition. Predicting event memorability,i.e., to what extent an event is recalled, is a tough challenge in memory research and has profound implications for artificial intelligence. In this study, we investigate factors that affect event memorability according to a cued recall process. Specifically, we explore whether event memorability is contingent on the event context, as well as the intrinsic visual attributes of image cues. We design a novel experiment protocol and conduct a large-scale experiment with 47 elder subjects over 3 months. Subjects' memory of life events is tested in a cued recall process. Using advanced visual analytics methods, we build a first-of-its-kind event memorability dataset (called R3) with rich information about event context and visual semantic features. Furthermore, we propose a contextual event memory network (CEMNet) that tackles multi-modal input to predict item-wise event memorability, which outperforms competitive benchmarks. The findings inform deeper understanding of episodic event memory, and open up a new avenue for prediction of human episodic memory.

contextual visual semantic, event memorability, name change, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

6baec7c4ba0a8734ccbd528a8090cb1f-Supplemental-Conference.pdf

Neural Information Processing SystemsOct-8-2025, 20:45:21 GMT

artificial intelligence, machine learning, natural language, (13 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.97)
Information Technology > Artificial Intelligence > Natural Language (0.72)

Add feedback

6baec7c4ba0a8734ccbd528a8090cb1f-Paper-Conference.pdf

Neural Information Processing SystemsOct-8-2025, 20:45:18 GMT

computer vision, machine learning, natural language, (13 more...)

Neural Information Processing Systems

Country: Asia > China > Beijing > Beijing (0.04)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Vision (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Glance and Focus: Memory Prompting for Multi-Event Video Question Answering

Neural Information Processing SystemsJan-19-2025, 02:48:35 GMT

Video Question Answering (VideoQA) has emerged as a vital tool to evaluate agents' ability to understand human daily behaviors. Despite the recent success of large vision language models in many multi-modal tasks, complex situation reasoning over videos involving multiple human-object interaction events still remains challenging. In contrast, humans can easily tackle it by using a series of episode memories as anchors to quickly locate question-related key moments for reasoning. To mimic this effective reasoning strategy, we propose the Glance- Focus model. One simple way is to apply an action detection model to predict a set of actions as key memories.

event memory, glance and focus, memory prompting, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.63)

Add feedback

Predicting Event Memorability from Contextual Visual Semantics

Neural Information Processing SystemsJan-18-2025, 22:47:59 GMT

Episodic event memory is a key component of human cognition. Predicting event memorability,i.e., to what extent an event is recalled, is a tough challenge in memory research and has profound implications for artificial intelligence. In this study, we investigate factors that affect event memorability according to a cued recall process. Specifically, we explore whether event memorability is contingent on the event context, as well as the intrinsic visual attributes of image cues. We design a novel experiment protocol and conduct a large-scale experiment with 47 elder subjects over 3 months.

contextual visual semantic, cued recall process, event memorability, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

MrSteve: Instruction-Following Agents in Minecraft with What-Where-When Memory

Park, Junyeong, Cho, Junmo, Ahn, Sungjin

arXiv.org Artificial IntelligenceDec-25-2024

Significant advances have been made in developing general-purpose embodied AI in environments like Minecraft through the adoption of LLM-augmented hierarchical approaches. While these approaches, which combine high-level planners with low-level controllers, show promise, low-level controllers frequently become performance bottlenecks due to repeated failures. In this paper, we argue that the primary cause of failure in many low-level controllers is the absence of an episodic memory system. To address this, we introduce MrSteve (Memory Recall Steve-1), a novel low-level controller equipped with Place Event Memory (PEM), a form of episodic memory that captures what, where, and when information from episodes. This directly addresses the main limitation of the popular low-level controller, Steve-1. Unlike previous models that rely on short-term memory, PEM organizes spatial and event-based data, enabling efficient recall and navigation in long-horizon tasks. Additionally, we propose an Exploration Strategy and a Memory-Augmented Task Solving Framework, allowing agents to alternate between exploration and task-solving based on recalled events. Our approach significantly improves task-solving and exploration efficiency compared to existing methods. We will release our code and demos on the project page: https://sites.google.com/view/mr-steve.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2411.06736

Country: