Scripts & Frames
Qualitative Event Perception: Leveraging Spatiotemporal Episodic Memory for Learning Combat in a Strategy Game
Hancock, Will, Forbus, Kenneth D.
Event perception refers to people's ability to carve up continuous experience into meaningful discrete events. We speak of finishing our morning coffee, mowing the lawn, leaving work, etc. as singular occurrences that are localized in time and space. In this work, we analyze how spatiotemporal representations can be used to automatically segment continuous experience into structured episodes, and how these descriptions can be used for analogical learning. These representations are based on Hayes' notion of histories and build upon existing work on qualitative episodic memory. Our agent automatically generates event descriptions of military battles in a strategy game and improves its gameplay by learning from this experience. Episodes are segmented based on changing properties in the world and we show evidence that they facilitate learning because they capture event descriptions at a useful spatiotemporal grain size. This is evaluated through our agent's performance in the game. We also show empirical evidence that the perception of spatial extent of episodes affects both their temporal duration as well as the number of overall cases generated.
- North America > United States > Illinois > Cook County > Chicago (0.05)
- North America > United States > Illinois > Cook County > Evanston (0.04)
- Leisure & Entertainment > Games (0.84)
- Government > Military (0.68)
- Health & Medicine > Consumer Health (0.61)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Qualitative Reasoning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Scripts & Frames (0.71)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.69)
- Information Technology > Artificial Intelligence > Cognitive Science > Cognitive Architectures (0.69)
ObjectNLQ @ Ego4D Episodic Memory Challenge 2024
Feng, Yisen, Zhang, Haoyu, Xie, Yuquan, Li, Zaijing, Liu, Meng, Nie, Liqiang
In this report, we present our approach for the Natural Language Query track and Goal Step track of the Ego4D Episodic Memory Benchmark at CVPR 2024. Both challenges require the localization of actions within long video sequences using textual queries. To enhance localization accuracy, our method not only processes the temporal information of videos but also identifies fine-grained objects spatially within the frames. To this end, we introduce a novel approach, termed ObjectNLQ, which incorporates an object branch to augment the video representation with detailed object information, thereby improving grounding efficiency. ObjectNLQ achieves a mean R@1 of 23.15, ranking 2nd in the Natural Language Queries Challenge, and gains 33.00 in terms of the metric R@1, IoU=0.3, ranking 3rd in the Goal Step Challenge. Our code will be released at https://github.com/Yisen-Feng/ObjectNLQ.
- Asia > China > Heilongjiang Province > Harbin (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- Information Technology > Artificial Intelligence > Natural Language (0.71)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Scripts & Frames (0.61)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-modal Structured Representations
Huang, Yufeng, Tang, Jiji, Chen, Zhuo, Zhang, Rongsheng, Zhang, Xinfeng, Chen, Weijie, Zhao, Zeng, Zhao, Zhou, Lv, Tangjie, Hu, Zhipeng, Zhang, Wen
Large-scale vision-language pre-training has achieved significant performance in multi-modal understanding and generation tasks. However, existing methods often perform poorly on image-text matching tasks that require structured representations, i.e., representations of objects, attributes, and relations. As illustrated in Fig.~reffig:case (a), the models cannot make a distinction between ``An astronaut rides a horse" and ``A horse rides an astronaut". This is because they fail to fully leverage structured knowledge when learning representations in multi-modal scenarios. In this paper, we present an end-to-end framework Structure-CLIP, which integrates Scene Graph Knowledge (SGK) to enhance multi-modal structured representations. Firstly, we use scene graphs to guide the construction of semantic negative examples, which results in an increased emphasis on learning structured representations. Moreover, a Knowledge-Enhance Encoder (KEE) is proposed to leverage SGK as input to further enhance structured representations. To verify the effectiveness of the proposed framework, we pre-train our model with the aforementioned approaches and conduct experiments on downstream tasks. Experimental results demonstrate that Structure-CLIP achieves state-of-the-art (SOTA) performance on VG-Attribution and VG-Relation datasets, with 12.5% and 4.1% ahead of the multi-modal SOTA model respectively. Meanwhile, the results on MSCOCO indicate that Structure-CLIP significantly enhances the structured representations while maintaining the ability of general representations. Our code is available at https://github.com/zjukg/Structure-CLIP.
- Information Technology > Artificial Intelligence > Representation & Reasoning > Scripts & Frames (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Koopman Learning with Episodic Memory
Redman, William T., Huang, Dean, Fonoberova, Maria, Mezić, Igor
Koopman operator theory, a data-driven dynamical systems framework, has found significant success in learning models from complex, real-world data sets, enabling state-of-the-art prediction and control. The greater interpretability and lower computational costs of these models, compared to traditional machine learning methodologies, make Koopman learning an especially appealing approach. Despite this, little work has been performed on endowing Koopman learning with the ability to learn from its own mistakes. To address this, we equip Koopman methods - developed for predicting non-stationary time-series - with an episodic memory mechanism, enabling global recall of (or attention to) periods in time where similar dynamics previously occurred. We find that a basic implementation of Koopman learning with episodic memory leads to significant improvements in prediction on synthetic and real-world data. Our framework has considerable potential for expansion, allowing for future advances, and opens exciting new directions for Koopman learning.
- North America > United States > District of Columbia > Washington (0.05)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Health & Medicine > Consumer Health (0.84)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.32)
- Health & Medicine > Therapeutic Area > Immunology (0.32)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Scripts & Frames (0.84)
How the Authors of the Bible Spun Triumph from Defeat
The Moshiach came to Madison Avenue this summer. All over a not particularly Jewish neighborhood, posters of the bearded, Rembrandtesque Rebbe Schneerson appeared, mucilaged to every light post and bearing the caption "Long Live the Lubavitcher Rebbe King Messiah forever!" This was, or ought to have been, trebly astonishing. First, the rebbe being urged to a longer life died in 1994, and the new insistence that he was nonetheless the Moshiach skirted, as his followers tend to do, the question of whether he might remain somehow alive. Second, the very concept of a messiah recapitulates a specific national hope of a small and oft-defeated nation several thousand years ago, and spoke originally to the local Judaean dream of a warrior who would lead his people to victory over the Persians, the Greeks, and, latterly, the Roman colonizers.
- Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.06)
- Asia > Afghanistan (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Scripts & Frames (0.40)
- Information Technology > Artificial Intelligence > History (0.40)
Towards Adaptable and Interactive Image Captioning with Data Augmentation and Episodic Memory
Anagnostopoulou, Aliki, Hartmann, Mareike, Sonntag, Daniel
Interactive machine learning (IML) is a beneficial learning paradigm in cases of limited data availability, as human feedback is incrementally integrated into the training process. In this paper, we present an IML pipeline for image captioning which allows us to incrementally adapt a pre-trained image captioning model to a new data distribution based on user input. In order to incorporate user input into the model, we explore the use of a combination of simple data augmentation methods to obtain larger data batches for each newly annotated data instance and implement continual learning methods to prevent catastrophic forgetting from repeated updates. For our experiments, we split a domain-specific image captioning dataset, namely VizWiz, into non-overlapping parts to simulate an incremental input flow for continually adapting the model to new data. We find that, while data augmentation worsens results, even when relatively small amounts of data are available, episodic memory is an effective strategy to retain knowledge from previously seen clusters.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (9 more...)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Scripts & Frames (0.62)
Semi-Structured Object Sequence Encoders
Murthy, Rudra V, Bhat, Riyaz, Gunasekara, Chulaka, Patel, Siva Sankalp, Wan, Hui, Dhamecha, Tejas Indulal, Contractor, Danish, Danilevsky, Marina
In this paper we explore the task of modeling semi-structured object sequences; in particular, we focus our attention on the problem of developing a structure-aware input representation for such sequences. Examples of such data include user activity on websites, machine logs, and many others. This type of data is often represented as a sequence of sets of key-value pairs over time and can present modeling challenges due to an ever-increasing sequence length. We propose a two-part approach, which first considers each key independently and encodes a representation of its values over time; we then self-attend over these value-aware key representations to accomplish a downstream task. This allows us to operate on longer object sequences than existing methods. We introduce a novel shared-attention-head architecture between the two modules and present an innovative training schedule that interleaves the training of both modules with shared weights for some attention heads. Our experiments on multiple prediction tasks using real-world data demonstrate that our approach outperforms a unified network with hierarchical encoding, as well as other methods including a record-centric representation and a flattened representation of the sequence.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- North America > Dominican Republic (0.04)
- (3 more...)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Scripts & Frames (0.64)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Keeping the Questions Conversational: Using Structured Representations to Resolve Dependency in Conversational Question Answering
Zaib, Munazza, Sheng, Quan Z., Zhang, Wei Emma, Mahmood, Adnan
Having an intelligent dialogue agent that can engage in conversational question answering (ConvQA) is now no longer limited to Sci-Fi movies only and has, in fact, turned into a reality. These intelligent agents are required to understand and correctly interpret the sequential turns provided as the context of the given question. However, these sequential questions are sometimes left implicit and thus require the resolution of some natural language phenomena such as anaphora and ellipsis. The task of question rewriting has the potential to address the challenges of resolving dependencies amongst the contextual turns by transforming them into intent-explicit questions. Nonetheless, the solution of rewriting the implicit questions comes with some potential challenges such as resulting in verbose questions and taking conversational aspect out of the scenario by generating self-contained questions. In this paper, we propose a novel framework, CONVSR (CONVQA using Structured Representations) for capturing and generating intermediate representations as conversational cues to enhance the capability of the QA model to better interpret the incomplete questions. We also deliberate how the strengths of this task could be leveraged in a bid to design more engaging and eloquent conversational agents. We test our model on the QuAC and CANARD datasets and illustrate by experimental results that our proposed framework achieves a better F1 score than the standard question rewriting model.
- Oceania > Australia > South Australia > Adelaide (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- Media (0.48)
- Leisure & Entertainment (0.34)
- Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.74)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Scripts & Frames (0.61)
Kernels on Structured Objects Through Nested Histograms
We propose a family of kernels for structured objects which is based on the bag-ofcomponents paradigm. However, rather than decomposing each complex object into the single histogram of its components, we use for each object a family of nested histograms, where each histogram in this hierarchy describes the object seen from an increasingly granular perspective. We use this hierarchy of histograms to define elementary kernels which can detect coarse and fine similarities between the objects. We compute through an efficient averaging trick a mixture of such specific kernels, to propose a final kernel value which weights efficiently local and global matches. We propose experimental results on an image retrieval experiment which show that this mixture is an effective template procedure to be used with kernels on histograms.
Incremental Prompting: Episodic Memory Prompt for Lifelong Event Detection
Liu, Minqian, Chang, Shiyu, Huang, Lifu
Lifelong event detection aims to incrementally update a model with new event types and data while retaining the capability on previously learned old types. One critical challenge is that the model would catastrophically forget old types when continually trained on new data. In this paper, we introduce Episodic Memory Prompts (EMP) to explicitly preserve the learned task-specific knowledge. Our method adopts continuous prompt for each task and they are optimized to instruct the model prediction and learn event-specific representation. The EMPs learned in previous tasks are carried along with the model in subsequent tasks, and can serve as a memory module that keeps the old knowledge and transferring to new tasks. Experiment results demonstrate the effectiveness of our method. Furthermore, we also conduct a comprehensive analysis of the new and old event types in lifelong learning.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > Dominican Republic (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- (12 more...)
- Health & Medicine > Consumer Health (0.61)
- Education > Educational Setting (0.49)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Scripts & Frames (0.61)