Goto

Collaborating Authors

 Choi, Matthew


Quantum linear algebra is all you need for Transformer architectures

arXiv.org Artificial Intelligence

Generative machine learning methods such as large-language models are revolutionizing the creation of text and images. While these models are powerful they also harness a large amount of computational resources. The transformer is a key component in large language models that aims to generate a suitable completion of a given partial sequence. In this work, we investigate transformer architectures under the lens of fault-tolerant quantum computing. The input model is one where trained weight matrices are given as block encodings and we construct the query, key, and value matrices for the transformer. We show how to prepare a block encoding of the self-attention matrix, with a new subroutine for the row-wise application of the softmax function. In addition, we combine quantum subroutines to construct important building blocks in the transformer, the residual connection and layer normalization, and the feed-forward neural network. Our subroutines prepare an amplitude encoding of the transformer output, which can be measured to obtain a prediction. Based on common open-source large-language models, we provide insights into the behavior of important parameters determining the run time of the quantum algorithm. We discuss the potential and challenges for obtaining a quantum advantage.


Large Language Models on Lexical Semantic Change Detection: An Evaluation

arXiv.org Artificial Intelligence

Lexical Semantic Change Detection stands out as one of the few areas where Large Language Models (LLMs) have not been extensively involved. Traditional methods like PPMI, and SGNS remain prevalent in research, alongside newer BERT-based approaches. Despite the comprehensive coverage of various natural language processing domains by LLMs, there is a notable scarcity of literature concerning their application in this specific realm. In this work, we seek to bridge this gap by introducing LLMs into the domain of Lexical Semantic Change Detection. Our work presents novel prompting solutions and a comprehensive evaluation that spans all three generations of language models, contributing to the exploration of LLMs in this research area.


FlexModel: A Framework for Interpretability of Distributed Large Language Models

arXiv.org Artificial Intelligence

With the growth of large language models, now incorporating billions of parameters, the hardware prerequisites for their training and deployment have seen a corresponding increase. Although existing tools facilitate model parallelization and distributed training, deeper model interactions, crucial for interpretability and responsible AI techniques, still demand thorough knowledge of distributed computing. This often hinders contributions from researchers with machine learning expertise but limited distributed computing background. Addressing this challenge, we present FlexModel, a software package providing a streamlined interface for engaging with models distributed across multi-GPU and multi-node configurations. The library is compatible with existing model distribution libraries and encapsulates PyTorch models. It exposes user-registerable HookFunctions to facilitate straightforward interaction with distributed model internals, bridging the gap between distributed and single-device model paradigms. Primarily, FlexModel enhances accessibility by democratizing model interactions and promotes more inclusive research in the domain of large-scale neural networks.


Learning quantum dynamics with latent neural ODEs

arXiv.org Artificial Intelligence

Deep learning and neural networks have recently become the powerhouse in machine learning (ML) and they have successfully been used to tackle complex problems In general, the study of open quantum systems are in classical [1-3] and quantum mechanics [4-7] (see Refs. important for quantum computing as well as many [8-12] for reviews). Machine-assisted scientific discovery other areas of physics from many-body phenomenon [27, is still in its infancy but progress has been made, mostly 28], light-matter interaction [29-31] to non-equilibrium by building the correct inductive bias-or structure into physics [32, 33]. the model or loss function. For example physical conservation laws can be learned [1, 2]. Other work has made progress, in a purely data-driven approach learning relationships between quantum experiments and entanglement Here, we demonstrate that latent ODEs can be trained using generative models [13]. Recently, neural to generate and extrapolate measurement data from dynamical ordinary differential equations (ODEs) were introduced quantum evolution in both closed and open [14, 15], a neural network layer defined by differential quantum systems using only physical observations without equations. Neural ODEs provide the perfect model for specifying the physics a priori. This is in line with physics, since many physical laws are governed by ODEs, treating the quantum system as a black box and the "shut and thus every neural ODE has the correct inductive bias up and calculate" philosophy [34] all the while ignoring built into the model itself.