Goto

Collaborating Authors

 ltm


A Time Series Multitask Framework Integrating a Large Language Model, Pre-Trained Time Series Model, and Knowledge Graph

Hao, Shule, Bao, Junpeng, Lu, Chuncheng

arXiv.org Artificial Intelligence

Time series analysis is crucial in fields like finance, transportation, and industry. However, traditional models often focus solely on temporal features, limiting their ability to capture underlying information. This paper proposes a novel time series multitask framework, called LTM, which integrates temporal features with textual descriptions to enhance analytical and predictive capabilities. LTM combines pre-trained time series model, large language model (LLM), and knowledge graph to tackle time series tasks, including forecasting, imputation, and anomaly detection. LTM achieves improved performance with a few trainable parameters. It is very efficient and practical. LTM encodes time series data into patches and enriches user-provided prompts using knowledge graphs to generate enhanced prompts. A novel feature fusion method embeds prompts into each patch encoding, which is processed by a frozen LLM, followed by a feature enhancement module and a time decoder module. During fine-tuning stage, cosine similarity between prompts and temporal patches is integrated into the loss function to boost performance. Experiments on benchmark datasets show that LTM significantly outperforms existing methods. It provides a robust and versatile solution for time series tasks.


Scalable Language Models with Posterior Inference of Latent Thought Vectors

Kong, Deqian, Zhao, Minglu, Xu, Dehong, Pang, Bo, Wang, Shu, Honig, Edouardo, Si, Zhangzhang, Li, Chuan, Xie, Jianwen, Xie, Sirui, Wu, Ying Nian

arXiv.org Machine Learning

We propose a novel family of language models, Latent-Thought Language Models (LTMs), which incorporate explicit latent thought vectors that follow an explicit prior model in latent space. These latent thought vectors guide the autoregressive generation of ground tokens through a Transformer decoder. Training employs a dual-rate optimization process within the classical variational Bayes framework: fast learning of local variational parameters for the posterior distribution of latent vectors, and slow learning of global decoder parameters. Empirical studies reveal that LTMs possess additional scaling dimensions beyond traditional LLMs, yielding a structured design space. Higher sample efficiency can be achieved by increasing training compute per token, with further gains possible by trading model size for more inference steps. Designed based on these scaling properties, LTMs demonstrate superior sample and parameter efficiency compared to conventional autoregressive models and discrete diffusion models. They significantly outperform these counterparts in validation perplexity and zero-shot language modeling. Additionally, LTMs exhibit emergent few-shot in-context reasoning capabilities that scale with model and latent size, and achieve competitive performance in conditional and unconditional text generation.


Generalization of Medical Large Language Models through Cross-Domain Weak Supervision

Long, Robert, Gonzalez, Eric, Fuller, Harrison

arXiv.org Artificial Intelligence

The advancement of large language models (LLMs) has opened new frontiers in natural language processing, particularly in specialized domains like healthcare. In this paper, we propose the Incremental Curriculum-Based Fine-Tuning (ICFT) framework to enhance the generative capabilities of medical large language models (MLLMs). ICFT combines curriculum-based learning, dual-stage memory coordination, and parameter-efficient fine-tuning to enable a progressive transition from general linguistic knowledge to strong domain-specific expertise. Experimental results across diverse medical NLP tasks, including question answering, preference classification, and response generation, demonstrate that ICFT consistently outperforms state-of-the-art baselines, achieving improvements in both accuracy and efficiency. Further analysis reveals the framework's ability to generalize to unseen data, reduce errors, and deliver diverse, contextually relevant medical responses. These findings establish ICFT as a robust and scalable solution for adapting LLMs to the medical domain, offering practical benefits for real-world healthcare applications.


$\infty$-Video: A Training-Free Approach to Long Video Understanding via Continuous-Time Memory Consolidation

Santos, Saul, Farinhas, António, McNamee, Daniel C., Martins, André F. T.

arXiv.org Artificial Intelligence

Current video-language models struggle with long-video understanding due to limited context lengths and reliance on sparse frame subsampling, often leading to information loss. This paper introduces $\infty$-Video, which can process arbitrarily long videos through a continuous-time long-term memory (LTM) consolidation mechanism. Our framework augments video Q-formers by allowing them to process unbounded video contexts efficiently and without requiring additional training. Through continuous attention, our approach dynamically allocates higher granularity to the most relevant video segments, forming "sticky" memories that evolve over time. Experiments with Video-LLaMA and VideoChat2 demonstrate improved performance in video question-answering tasks, showcasing the potential of continuous-time LTM mechanisms to enable scalable and training-free comprehension of long videos.


Distributed Differentially Private Data Analytics via Secure Sketching

Burkhardt, Jakob, Keller, Hannah, Orlandi, Claudio, Schwiegelshohn, Chris

arXiv.org Artificial Intelligence

We explore the use of distributed differentially private computations across multiple servers, balancing the tradeoff between the error introduced by the differentially private mechanism and the computational efficiency of the resulting distributed algorithm. We introduce the linear-transformation model, where clients have access to a trusted platform capable of applying a public matrix to their inputs. Such computations can be securely distributed across multiple servers using simple and efficient secure multiparty computation techniques. The linear-transformation model serves as an intermediate model between the highly expressive central model and the minimal local model. In the central model, clients have access to a trusted platform capable of applying any function to their inputs. However, this expressiveness comes at a cost, as it is often expensive to distribute such computations, leading to the central model typically being implemented by a single trusted server. In contrast, the local model assumes no trusted platform, which forces clients to add significant noise to their data. The linear-transformation model avoids the single point of failure for privacy present in the central model, while also mitigating the high noise required in the local model. We demonstrate that linear transformations are very useful for differential privacy, allowing for the computation of linear sketches of input data. These sketches largely preserve utility for tasks such as private low-rank approximation and private ridge regression, while introducing only minimal error, critically independent of the number of clients. Previously, such accuracy had only been achieved in the more expressive central model.


Long Term Memory: The Foundation of AI Self-Evolution

Jiang, Xun, Li, Feng, Zhao, Han, Wang, Jiaying, Shao, Jun, Xu, Shihao, Zhang, Shu, Chen, Weiling, Tang, Xavier, Chen, Yize, Wu, Mengyue, Ma, Weizhi, Wang, Mengdi, Chen, Tianqiao

arXiv.org Artificial Intelligence

Large language models (LLMs) like GPTs, trained on vast datasets, have demonstrated impressive capabilities in language understanding, reasoning, and planning, achieving human-level performance in various tasks. Most studies focus on enhancing these models by training on ever-larger datasets to build more powerful foundation models. While training stronger models is important, enabling models to evolve during inference is equally crucial, a process we refer to as AI self-evolution. Unlike large-scale training, self-evolution may rely on limited data or interactions. Inspired by the columnar organization of the human cerebral cortex, we hypothesize that AI models could develop cognitive abilities and build internal representations through iterative interactions with their environment. To achieve this, models need long-term memory (LTM) to store and manage processed interaction data. LTM supports self-evolution by representing diverse experiences across environments and agents. In this report, we explore AI self-evolution and its potential to enhance models during inference. We examine LTM's role in lifelong learning, allowing models to evolve based on accumulated interactions. We outline the structure of LTM and the systems needed for effective data retention and representation. We also classify approaches for building personalized models with LTM data and show how these models achieve self-evolution through interaction. Using LTM, our multi-agent framework OMNE achieved first place on the GAIA benchmark, demonstrating LTM's potential for AI self-evolution. Finally, we present a roadmap for future research, emphasizing the importance of LTM for advancing AI technology and its practical applications.


Memory Management for Real-Time Appearance-Based Loop Closure Detection

Labbé, Mathieu, Michaud, François

arXiv.org Artificial Intelligence

Loop closure detection is the process involved when trying to find a match between the current and a previously visited locations in SLAM. Over time, the amount of time required to process new observations increases with the size of the internal map, which may influence real-time processing. In this paper, we present a novel real-time loop closure detection approach for large-scale and long-term SLAM. Our approach is based on a memory management method that keeps computation time for each new observation under a fixed limit. Results demonstrate the approach's adaptability and scalability using four standard data sets.


Why Tabular Foundation Models Should Be a Research Priority

van Breugel, Boris, van der Schaar, Mihaela

arXiv.org Artificial Intelligence

Recent text and image foundation models are incredibly impressive, and these models are attracting an ever-increasing portion of research resources. In this position piece we aim to shift the ML research community's priorities ever so slightly to a different modality: tabular data. Tabular data is the dominant modality in many fields, yet it is given hardly any research attention and significantly lags behind in terms of scale and power. We believe the time is now to start developing tabular foundation models, or what we coin a Large Tabular Model (LTM). LTMs could revolutionise the way science and ML use tabular data: not as single datasets that are analyzed in a vacuum, but contextualized with respect to related datasets. The potential impact is far-reaching: from few-shot tabular models to automating data science; from out-of-distribution synthetic data to empowering multidisciplinary scientific discovery. We intend to excite reflections on the modalities we study, and convince some researchers to study large tabular models.


MemoNav: Working Memory Model for Visual Navigation

Li, Hongxin, Wang, Zeyu, Yang, Xu, Yang, Yuran, Mei, Shuqi, Zhang, Zhaoxiang

arXiv.org Artificial Intelligence

Image-goal navigation is a challenging task that requires an agent to navigate to a goal indicated by an image in unfamiliar environments. Existing methods utilizing diverse scene memories suffer from inefficient exploration since they use all historical observations for decision-making without considering the goal-relevant fraction. To address this limitation, we present MemoNav, a novel memory model for image-goal navigation, which utilizes a working memory-inspired pipeline to improve navigation performance. Specifically, we employ three types of navigation memory. The node features on a map are stored in the short-term memory (STM), as these features are dynamically updated. A forgetting module then retains the informative STM fraction to increase efficiency. We also introduce long-term memory (LTM) to learn global scene representations by progressively aggregating STM features. Subsequently, a graph attention module encodes the retained STM and the LTM to generate working memory (WM) which contains the scene features essential for efficient navigation. The synergy among these three memory types boosts navigation performance by enabling the agent to learn and leverage goal-relevant scene features within a topological map. Our evaluation on multi-goal tasks demonstrates that MemoNav significantly outperforms previous methods across all difficulty levels in both Gibson and Matterport3D scenes. Qualitative results further illustrate that MemoNav plans more efficient routes.


Grid Cell-Inspired Fragmentation and Recall for Efficient Map Building

Hwang, Jaedong, Hong, Zhang-Wei, Chen, Eric, Boopathy, Akhilan, Agrawal, Pulkit, Fiete, Ila

arXiv.org Artificial Intelligence

Animals and robots navigate through environments by building and refining maps of space. These maps enable functions including navigation back to home, planning, search and foraging. Here, we use observations from neuroscience, specifically the observed fragmentation of grid cell map in compartmentalized spaces, to propose and apply the concept of Fragmentation-and-Recall (FARMap) in the mapping of large spaces. Agents solve the mapping problem by building local maps via a surprisal-based clustering of space, which they use to set subgoals for spatial exploration. Agents build and use a local map to predict their observations; high surprisal leads to a "fragmentation event" that truncates the local map. At these events, the recent local map is placed into long-term memory (LTM) and a different local map is initialized. If observations at a fracture point match observations in one of the stored local maps, that map is recalled (and thus reused) from LTM. The fragmentation points induce a natural online clustering of the larger space, forming a set of intrinsic potential subgoals that are stored in LTM as a topological graph. Agents choose their next subgoal from the set of near and far potential subgoals from within the current local map or LTM, respectively. Thus, local maps guide exploration locally, while LTM promotes global exploration. We evaluate FARMap on complex procedurally-generated spatial environments and realistic simulations to demonstrate that this mapping strategy much more rapidly covers the environment (number of agent steps and wall clock time) and is more efficient in active memory usage, without loss of performance.