Goto

Collaborating Authors

 Zhao, Qihang


dnaGrinder: a lightweight and high-capacity genomic foundation model

arXiv.org Artificial Intelligence

Foundation models (aka large language models) such as BERT [1] and GPT [2], have demonstrated their stellar performance in learning the complex characteristics and structures of natural languages, making them well-suited for a variety of subsequent applications, such as sentiment analysis, text generation, and translation [3]. These foundation models have recently been adapted to analyze biological sequences as their deep structure and large-scale parameters are well suited for dealing with the intricacy of biological sequences and structures [4, 5, 6, 7, 8, 9, 10, 11]. Biological sequences composed of nucleotides like DNA and RNA, as well as amino acids forming peptides and proteins, are regarded as natural languages of life and can be effectively leveraged by using the technology of foundation models to uncover the underlying patterns and functions they encode [12]. Typically, these foundation models build robust feature representations from biological sequences through a process known as pretraining. Encoder-based models like BERT perform such pretraining by using a method called Masked Language Modeling (MLM), where they predict the actual words of some masked or corrupted ones in given sequences. By pretraining on millions of biological sequences, foundation models gain a comprehensive contextual understanding of the given sequences. Once trained, they only need a few fine-tuning steps to be effectively applicable to specific downstream tasks [13], including prediction of epigenetic marks, gene expressions, protein folding structures, and more.


Redefining Information Retrieval of Structured Database via Large Language Models

arXiv.org Artificial Intelligence

Retrieval augmentation is critical when Language Models (LMs) exploit non-parametric knowledge related to the query through external knowledge bases before reasoning. The retrieved information is incorporated into LMs as context alongside the query, enhancing the reliability of responses towards factual questions. Prior researches in retrieval augmentation typically follow a retriever-generator paradigm. In this context, traditional retrievers encounter challenges in precisely and seamlessly extracting query-relevant information from knowledge bases. To address this issue, this paper introduces a novel retrieval augmentation framework called ChatLR that primarily employs the powerful semantic understanding ability of Large Language Models (LLMs) as retrievers to achieve precise and concise information retrieval. Additionally, we construct an LLM-based search and question answering system tailored for the financial domain by fine-tuning LLM on two tasks including Text2API and API-ID recognition. Experimental results demonstrate the effectiveness of ChatLR in addressing user queries, achieving an overall information retrieval accuracy exceeding 98.8\%.


Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence

arXiv.org Artificial Intelligence

We present Eagle (RWKV-5) and Finch (RWKV-6), sequence models improving upon the RWKV (RWKV-4) architecture. Our architectural design advancements include multi-headed matrix-valued states and a dynamic recurrence mechanism that improve expressivity while maintaining the inference efficiency characteristics of RNNs. We introduce a new multilingual corpus with 1.12 trillion tokens and a fast tokenizer based on greedy matching for enhanced multilinguality. We trained four Eagle models, ranging from 0.46 to 7.5 billion parameters, and two Finch models with 1.6 and 3.1 billion parameters and find that they achieve competitive performance across a wide variety of benchmarks. We release all our models on HuggingFace under the Apache 2.0 license. Models at: https://huggingface.co/RWKV Training code at: https://github.com/RWKV/RWKV-LM Inference code at: https://github.com/RWKV/ChatRWKV Time-parallel training code at: https://github.com/RWKV/RWKV-infctx-trainer


Highly Accurate Disease Diagnosis and Highly Reproducible Biomarker Identification with PathFormer

arXiv.org Artificial Intelligence

Biomarker identification is critical for precise disease diagnosis and understanding disease pathogenesis in omics data analysis, like using fold change and regression analysis. Graph neural networks (GNNs) have been the dominant deep learning model for analyzing graph-structured data. However, we found two major limitations of existing GNNs in omics data analysis, i.e., limited-prediction (diagnosis) accuracy and limited-reproducible biomarker identification capacity across multiple datasets. The root of the challenges is the unique graph structure of biological signaling pathways, which consists of a large number of targets and intensive and complex signaling interactions among these targets. To resolve these two challenges, in this study, we presented a novel GNN model architecture, named PathFormer, which systematically integrate signaling network, priori knowledge and omics data to rank biomarkers and predict disease diagnosis. In the comparison results, PathFormer outperformed existing GNN models significantly in terms of highly accurate prediction capability ( 30% accuracy improvement in disease diagnosis compared with existing GNN models) and high reproducibility of biomarker ranking across different datasets. The improvement was confirmed using two independent Alzheimer's Disease (AD) and cancer transcriptomic datasets. The PathFormer model can be directly applied to other omics data analysis studies.


RWKV: Reinventing RNNs for the Transformer Era

arXiv.org Artificial Intelligence

Transformers have revolutionized almost all natural language processing (NLP) tasks but suffer from memory and computational complexity that scales quadratically with sequence length. In contrast, recurrent neural networks (RNNs) exhibit linear scaling in memory and computational requirements but struggle to match the same performance as Transformers due to limitations in parallelization and scalability. We propose a novel model architecture, Receptance Weighted Key Value (RWKV), that combines the efficient parallelizable training of transformers with the efficient inference of RNNs. Our approach leverages a linear attention mechanism and allows us to formulate the model as either a Transformer or an RNN, thus parallelizing computations during training and maintains constant computational and memory complexity during inference. We scale our models as large as 14 billion parameters, by far the largest dense RNN ever trained, and find RWKV performs on par with similarly sized Transformers, suggesting future work can leverage this architecture to create more efficient models. This work presents a significant step towards reconciling trade-offs between computational efficiency and model performance in sequence processing tasks.


TCJA-SNN: Temporal-Channel Joint Attention for Spiking Neural Networks

arXiv.org Artificial Intelligence

Spiking Neural Networks (SNNs) are attracting widespread interest due to their biological plausibility, energy efficiency, and powerful spatio-temporal information representation ability. Given the critical role of attention mechanisms in enhancing neural network performance, the integration of SNNs and attention mechanisms exhibits potential to deliver energy-efficient and high-performance computing paradigms. We present a novel Temporal-Channel Joint Attention mechanism for SNNs, referred to as TCJA-SNN. The proposed TCJA-SNN framework can effectively assess the significance of spike sequence from both spatial and temporal dimensions. More specifically, our essential technical contribution lies on: 1) We employ the squeeze operation to compress the spike stream into an average matrix. Then, we leverage two local attention mechanisms based on efficient 1D convolutions to facilitate comprehensive feature extraction at the temporal and channel levels independently. 2) We introduce the Cross Convolutional Fusion (CCF) layer as a novel approach to model the inter-dependencies between the temporal and channel scopes. This layer breaks the independence of these two dimensions and enables the interaction between features. Experimental results demonstrate that the proposed TCJA-SNN outperforms SOTA by up to 15.7% accuracy on standard static and neuromorphic datasets, including Fashion-MNIST, CIFAR10-DVS, N-Caltech 101, and DVS128 Gesture. Furthermore, we apply the TCJA-SNN framework to image generation tasks by leveraging a variation autoencoder. To the best of our knowledge, this study is the first instance where the SNN-attention mechanism has been employed for image classification and generation tasks. Notably, our approach has achieved SOTA performance in both domains, establishing a significant advancement in the field. Codes are available at https://github.com/ridgerchu/TCJA.


SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks

arXiv.org Artificial Intelligence

As the size of large language models continue to scale, so does the computational resources required to run it. Spiking Neural Networks (SNNs) have emerged as an energy-efficient approach to deep learning that leverage sparse and event-driven activations to reduce the computational overhead associated with model inference. While they have become competitive with non-spiking models on many computer vision tasks, SNNs have also proven to be more challenging to train. As a result, their performance lags behind modern deep learning, and we are yet to see the effectiveness of SNNs in language generation. In this paper, inspired by the Receptance Weighted Key Value (RWKV) language model, we successfully implement `SpikeGPT', a generative language model with binary, event-driven spiking activation units. We train the proposed model on two model variants: 45M and 216M parameters. To the best of our knowledge, SpikeGPT is the largest backpropagation-trained SNN model to date, rendering it suitable for both the generation and comprehension of natural language. We achieve this by modifying the transformer block to replace multi-head self attention to reduce quadratic computational complexity O(N^2) to linear complexity O(N) with increasing sequence length. Input tokens are instead streamed in sequentially to our attention mechanism (as with typical SNNs). Our preliminary experiments show that SpikeGPT remains competitive with non-spiking models on tested benchmarks, while maintaining 20x fewer operations when processed on neuromorphic hardware that can leverage sparse, event-driven activations.


Utilizing Citation Network Structure to Predict Citation Counts: A Deep Learning Approach

arXiv.org Artificial Intelligence

With the advancement of science and technology, the number of academic papers published in the world each year has increased almost exponentially. While a large number of research papers highlight the prosperity of science and technology, they also give rise to some problems. As we all know, academic papers are the most intuitive embodiment of the research results of scholars, which can reflect the level of researchers. It is also the evaluation standard for decision-making such as promotion and allocation of funds. Therefore, how to measure the quality of an academic paper is very important. The most common standard for measuring academic papers is the number of citation counts of papers, because this indicator is widely used in the evaluation of scientific publications, and it also serves as the basis for many other indicators (such as the h-index). Therefore, it is very important to be able to accurately predict the citation counts of academic papers. This paper proposes an end-to-end deep learning network, DeepCCP, which combines the effect of information cascade and looks at the citation counts prediction problem from the perspective of information cascade prediction. DeepCCP directly uses the citation network formed in the early stage of the paper as the input, and the output is the citation counts of the corresponding paper after a period of time. DeepCCP only uses the structure and temporal information of the citation network, and does not require other additional information, but it can still achieve outstanding performance. According to experiments on 6 real data sets, DeepCCP is superior to the state-of-the-art methods in terms of the accuracy of citation count prediction.