Goto

Collaborating Authors

 stage






M+: Extending MemoryLLM with Scalable Long-Term Memory

Wang, Yu, Krotov, Dmitry, Hu, Yuanzhe, Gao, Yifan, Zhou, Wangchunshu, McAuley, Julian, Gutfreund, Dan, Feris, Rogerio, He, Zexue

arXiv.org Artificial Intelligence

Equipping large language models (LLMs) with latent-space memory has attracted increasing attention as they can extend the context window of existing language models. However, retaining information from the distant past remains a challenge. For example, MemoryLLM (Wang et al., 2024a), as a representative work with latent-space memory, compresses past information into hidden states across all layers, forming a memory pool of 1B parameters. While effective for sequence lengths up to 16k tokens, it struggles to retain knowledge beyond 20k tokens. In this work, we address this limitation by introducing M+, a memory-augmented model based on MemoryLLM that significantly enhances long-term information retention. M+ integrates a long-term memory mechanism with a co-trained retriever, dynamically retrieving relevant information during text generation. We evaluate M+ on diverse benchmarks, including long-context understanding and knowledge retention tasks. Experimental results show that M+ significantly outperforms MemoryLLM and recent strong baselines, extending knowledge retention from under 20k to over 160k tokens with similar GPU memory overhead.


CodeBrain: Impute Any Brain MRI via Instance-specific Scalar-quantized Codes

Wu, Yicheng, Song, Tao, Wu, Zhonghua, Ge, Zongyuan, Chen, Zhaolin, Cai, Jianfei

arXiv.org Artificial Intelligence

MRI imputation aims to synthesize the missing modality from one or more available ones, which is highly desirable since it reduces scanning costs and delivers comprehensive MRI information to enhance clinical diagnosis. In this paper, we propose a unified model, CodeBrain, designed to adapt to various brain MRI imputation scenarios. The core design lies in casting various inter-modality transformations as a full-modality code prediction task. To this end, CodeBrain is trained in two stages: Reconstruction and Code Prediction. First, in the Reconstruction stage, we reconstruct each MRI modality, which is mapped into a shared latent space followed by a scalar quantization. Since such quantization is lossy and the code is low dimensional, another MRI modality belonging to the same subject is randomly selected to generate common features to supplement the code and boost the target reconstruction. In the second stage, we train another encoder by a customized grading loss to predict the full-modality codes from randomly masked MRI samples, supervised by the corresponding quantized codes generated from the first stage. In this way, the inter-modality transformation is achieved by mapping the instance-specific codes in a finite scalar space. We evaluated the proposed CodeBrain model on two public brain MRI datasets (i.e., IXI and BraTS 2023). Extensive experiments demonstrate that our CodeBrain model achieves superior imputation performance compared to four existing methods, establishing a new state of the art for unified brain MRI imputation. Codes will be released.


Advancing Multi-Party Dialogue Systems with Speaker-ware Contrastive Learning

Hu, Zhongtian, He, Qi, Li, Ronghan, Zhao, Meng, Wang, Lifang

arXiv.org Artificial Intelligence

Dialogue response generation has made significant progress, but most research has focused on dyadic dialogue. In contrast, multi-party dialogues involve more participants, each potentially discussing different topics, making the task more complex. Current methods often rely on graph neural networks to model dialogue context, which helps capture the structural dynamics of multi-party conversations. However, these methods are heavily dependent on intricate graph structures and dataset annotations, and they often overlook the distinct speaking styles of participants. To address these challenges, we propose CMR, a Contrastive learning-based Multi-party dialogue Response generation model. CMR uses self-supervised contrastive learning to better distinguish "who says what." Additionally, by comparing speakers within the same conversation, the model captures differences in speaking styles and thematic transitions. To the best of our knowledge, this is the first approach to apply contrastive learning in multi-party dialogue generation. Experimental results show that CMR significantly outperforms state-of-the-art models in multi-party dialogue response tasks.


GLINKX: A Scalable Unified Framework For Homophilous and Heterophilous Graphs

Papachristou, Marios, Goel, Rishab, Portman, Frank, Miller, Matthew, Jin, Rong

arXiv.org Artificial Intelligence

In graph learning, there have been two predominant inductive biases regarding graph-inspired architectures: On the one hand, higher-order interactions and message passing work well on homophilous graphs and are leveraged by GCNs and GATs. On the other hand, shallow (or node-level) models using ego features and adjacency embeddings work well in heterophilous graphs. In this work, we propose a novel scalable shallow method - GLINKX - that can work both on homophilous and heterophilous graphs. Formally, we prove novel error bounds and justify the components of GLINKX. Experimentally, we show its effectiveness on several homophilous and heterophilous datasets. In recent years, graph learning methods have emerged with a strong performance for various ML tasks. Graph ML methods leverage the topology of graphs underlying the data (Battaglia et al., 2018) to improve their performance. Two very important design options for proposing graph ML based architectures in the context of node classification are related to whether the data is homophilous or heterophilous. For homophilous data - where neighboring nodes share similar labels (McPherson et al., 2001; Altenburger & Ugander, 2018a) - Graph Neural Network (GNN)-based methods are able to achieve high accuracy. Specifically, a broad subclass sucessfull GNNs are Graph Convolutional Networks (GCNs) (e.g., GCN, GAT, etc.) (Kipf & Welling, 2016; Veličković et al., 2017; Zhu et al., 2020).


PPGN: Physics-Preserved Graph Networks for Real-Time Fault Location in Distribution Systems with Limited Observation and Labels

Li, Wenting, Deka, Deepjyoti

arXiv.org Artificial Intelligence

Electrical faults may trigger blackouts or wildfires without timely monitoring and control strategy. Traditional solutions for locating faults in distribution systems are not real-time when network observability is low, while novel black-box machine learning methods are vulnerable to stochastic environments. We propose a novel Physics-Preserved Graph Network (PPGN) architecture to accurately locate faults at the node level with limited observability and labeled training data. PPGN has a unique two-stage graph neural network architecture. The first stage learns the graph embedding to represent the entire network using a few measured nodes. The second stage finds relations between the labeled and unlabeled data samples to further improve the location accuracy. We explain the benefits of the two-stage graph configuration through a random walk equivalence. We numerically validate the proposed method in the IEEE 123-node and 37-node test feeders, demonstrating the superior performance over three baseline classifiers when labeled training data is limited, and loads and topology are allowed to vary.


Management of Machine Learning Lifecycle Artifacts: A Survey

Schlegel, Marius, Sattler, Kai-Uwe

arXiv.org Artificial Intelligence

The explorative and iterative nature of developing and operating machine learning (ML) applications leads to a variety of artifacts, such as datasets, features, models, hyperparameters, metrics, software, configurations, and logs. In order to enable comparability, reproducibility, and traceability of these artifacts across the ML lifecycle steps and iterations, systems and tools have been developed to support their collection, storage, and management. It is often not obvious what precise functional scope such systems offer so that the comparison and the estimation of synergy effects between candidates are quite challenging. In this paper, we aim to give an overview of systems and platforms which support the management of ML lifecycle artifacts. Based on a systematic literature review, we derive assessment criteria and apply them to a representative selection of more than 60 systems and platforms.