Turek, Javier S.
Assessing Episodic Memory in LLMs with Sequence Order Recall Tasks
Pink, Mathis, Vo, Vy A., Wu, Qinyuan, Mu, Jianing, Turek, Javier S., Hasson, Uri, Norman, Kenneth A., Michelmann, Sebastian, Huth, Alexander, Toneva, Mariya
Current LLM benchmarks focus on evaluating models' memory of facts and semantic relations, primarily assessing semantic aspects of long-term memory. However, in humans, long-term memory also includes episodic memory, which links memories to their contexts, such as the time and place they occurred. The ability to contextualize memories is crucial for many cognitive tasks and everyday functions. This form of memory has not been evaluated in LLMs with existing benchmarks. To address the gap in evaluating memory in LLMs, we introduce Sequence Order Recall Tasks (SORT), which we adapt from tasks used to study episodic memory in cognitive psychology. SORT requires LLMs to recall the correct order of text segments, and provides a general framework that is both easily extendable and does not require any additional annotations. We present an initial evaluation dataset, Book-SORT, comprising 36k pairs of segments extracted from 9 books recently added to the public domain. Based on a human experiment with 155 participants, we show that humans can recall sequence order based on long-term memory of a book. We find that models can perform the task with high accuracy when relevant text is given in-context during the SORT evaluation. However, when presented with the book text only during training, LLMs' performance on SORT falls short. By making it possible to evaluate more aspects of memory, we believe that SORT will aid in the emerging development of memory-augmented models. Large language models (LLMs) have impressive performance on many benchmarks that test factual or semantic knowledge learned during training or in-context (Hendrycks et al., 2020; Ryo et al., 2023; Logan IV et al., 2019; Petroni et al., 2019; Yu et al., 2023; Sun et al., 2023). While these advances are noteworthy, the type of long-term knowledge that these datasets test is only one of several types that naturally intelligent systems store, retrieve, and update continuously over time (Norris, 2017; Izquierdo et al., 1999; McClelland et al., 1995). Current evaluation tasks do not assess episodic memory, which is a form of long-term knowledge thought to be important for cognitive function in humans and animals. In contrast to semantic memory, episodic memory links memories to their contexts, such as the time and place they occurred.
A single-layer RNN can approximate stacked and bidirectional RNNs, and topologies in between
Turek, Javier S., Jain, Shailee, Capota, Mihai, Huth, Alexander G., Willke, Theodore L.
To enhance the expressiveness and representational capacity of recurrent neural networks (RNN), a large body of work has emerged exploring stacked architectures with additional topological modifications like shortcut connections or bidirectionality. However, choosing the best network for a particular problem requires a combinatorial search over architectures and their hyperparameters. In this work, we show that a single-layer RNN can perfectly mimic an arbitrarily deep stacked RNN under specific constraints on its weight matrix and a delay between input and output. This obviates the need to manually select hyperparameters like the number of layers. Additionally, we show that weakening weight constraints while keeping the delay gives rise to partial acausality in the single-layer RNN, much like a bidirectional network. Synthetic experiments confirm that the delayed RNN can mimic bidirectional networks in perfectly solving some acausal tasks, outperforming them in others. Finally, we show that in a challenging language processing task, the delayed RNN performs within 0.3\% of the accuracy of the bidirectional network while reducing computational costs.
Sparse and low-rank approximations of large symmetric matrices using biharmonic interpolation
Turek, Javier S., Huth, Alexander
Symmetric matrices are widely used in machine learning problems such as kernel machines and manifold learning. Using large datasets often requires computing low-rank approximations of these symmetric matrices so that they fit in memory. In this paper, we present a novel method based on biharmonic interpolation for low-rank matrix approximation. The method exploits knowledge of the data manifold to learn an interpolation operator that approximates values using a subset of randomly selected landmark points. This operator is readily sparsified, reducing memory requirements by at least two orders of magnitude without significant loss in accuracy. We show that our method can approximate very large datasets using twenty times more landmarks than other methods. Further, numerical results suggest that our method is stable even when numerical difficulties arise for other methods.
A Searchlight Factor Model Approach for Locating Shared Information in Multi-Subject fMRI Analysis
Zhang, Hejia, Chen, Po-Hsuan, Chen, Janice, Zhu, Xia, Turek, Javier S., Willke, Theodore L., Hasson, Uri, Ramadge, Peter J.
There is a growing interest in joint multi-subject fMRI analysis. The challenge of such analysis comes from inherent anatomical and functional variability across subjects. One approach to resolving this is a shared response factor model. This assumes a shared and time synchronized stimulus across subjects. Such a model can often identify shared information, but it may not be able to pinpoint with high resolution the spatial location of this information. In this work, we examine a searchlight based shared response model to identify shared information in small contiguous regions (searchlights) across the whole brain. Validation using classification tasks demonstrates that we can pinpoint informative local regions.
Enabling Factor Analysis on Thousand-Subject Neuroimaging Datasets
Anderson, Michael J., Capotฤ, Mihai, Turek, Javier S., Zhu, Xia, Willke, Theodore L., Wang, Yida, Chen, Po-Hsuan, Manning, Jeremy R., Ramadge, Peter J., Norman, Kenneth A.
The scale of functional magnetic resonance image data is rapidly increasing as large multi-subject datasets are becoming widely available and high-resolution scanners are adopted. The inherent low-dimensionality of the information in this data has led neuroscientists to consider factor analysis methods to extract and analyze the underlying brain activity. In this work, we consider two recent multi-subject factor analysis methods: the Shared Response Model and Hierarchical Topographic Factor Analysis. We perform analytical, algorithmic, and code optimization to enable multi-node parallel implementations to scale. Single-node improvements result in 99x and 1812x speedups on these two methods, and enables the processing of larger datasets. Our distributed implementations show strong scaling of 3.3x and 5.5x respectively with 20 nodes on real datasets. We also demonstrate weak scaling on a synthetic dataset with 1024 subjects, on up to 1024 nodes and 32,768 cores.
A Convolutional Autoencoder for Multi-Subject fMRI Data Aggregation
Chen, Po-Hsuan, Zhu, Xia, Zhang, Hejia, Turek, Javier S., Chen, Janice, Willke, Theodore L., Hasson, Uri, Ramadge, Peter J.
Finding the most effective way to aggregate multi-subject fMRI data is a long-standing and challenging problem. It is of increasing interest in contemporary fMRI studies of human cognition due to the scarcity of data per subject and the variability of brain anatomy and functional response across subjects. Recent work on latent factor models shows promising results in this task but this approach does not preserve spatial locality in the brain. We examine two ways to combine the ideas of a factor model and a searchlight based analysis to aggregate multi-subject fMRI data while preserving spatial locality. We first do this directly by combining a recent factor method known as a shared response model with searchlight analysis. Then we design a multi-view convolutional autoencoder for the same task. Both approaches preserve spatial locality and have competitive or better performance compared with standard searchlight analysis and the shared response model applied across the whole brain. We also report a system design to handle the computational challenge of training the convolutional autoencoder.
A multilevel framework for sparse optimization with application to inverse covariance estimation and logistic regression
Treister, Eran, Turek, Javier S., Yavneh, Irad
Solving l1 regularized optimization problems is common in the fields of computational biology, signal processing and machine learning. Such l1 regularization is utilized to find sparse minimizers of convex functions. A well-known example is the LASSO problem, where the l1 norm regularizes a quadratic function. A multilevel framework is presented for solving such l1 regularized sparse optimization problems efficiently. We take advantage of the expected sparseness of the solution, and create a hierarchy of problems of similar type, which is traversed in order to accelerate the optimization process. This framework is applied for solving two problems: (1) the sparse inverse covariance estimation problem, and (2) l1-regularized logistic regression. In the first problem, the inverse of an unknown covariance matrix of a multivariate normal distribution is estimated, under the assumption that it is sparse. To this end, an l1 regularized log-determinant optimization problem needs to be solved. This task is challenging especially for large-scale datasets, due to time and memory limitations. In the second problem, the l1-regularization is added to the logistic regression classification objective to reduce overfitting to the data and obtain a sparse model. Numerical experiments demonstrate the efficiency of the multilevel framework in accelerating existing iterative solvers for both of these problems.
A Block-Coordinate Descent Approach for Large-scale Sparse Inverse Covariance Estimation
Treister, Eran, Turek, Javier S.
The sparse inverse covariance estimation problem arises in many statistical applications in machine learning and signal processing. In this problem, the inverse of a covariance matrix of a multivariate normal distribution is estimated, assuming that it is sparse. An $\ell_1$ regularized log-determinant optimization problem is typically solved to approximate such matrices. Because of memory limitations, most existing algorithms are unable to handle large scale instances of this problem. In this paper we present a new block-coordinate descent approach for solving the problem for large-scale data sets. Our method treats the sought matrix block-by-block using quadratic approximations, and we show that this approach has advantages over existing methods in several aspects. Numerical experiments on both synthetic and real gene expression data demonstrate that our approach outperforms the existing state of the art methods, especially for large-scale problems.