Goto

Collaborating Authors

 primer


Searching for Efficient Transformers for Language Modeling

Neural Information Processing Systems

Large Transformer models have been central to recent advances in natural language processing. The training and inference costs of these models, however, have grown rapidly and become prohibitively expensive. Here we aim to reduce the costs of Transformers by searching for a more efficient variant. Compared to previous approaches, our search is performed at a lower level, over the primitives that define a Transformer TensorFlow program. We identify an architecture, named Primer, that has a smaller training cost than the original Transformer and other variants for auto-regressive language modeling.


A Appendix

Neural Information Processing Systems

A.1 T ensorFlow Primitives V ocabulary Name TF Function Argument Mapping Input 1 Input 2 Constant Dim Size ADD tf.math.add "Name" is the name of the operation in our search "TF Function" is the TensorFlow function that the name is mapped to when a DNA instruction "Argument Mapping" describes how the values in a DNA's argument set are mapped to the corresponding TensorFlow function arguments. TensorFlow graphs are built from DNA programs as described in Section 2 of the main text. The vocabulary for these relative dimensions is [1, 2, 4, 8, 12, 16, 24, 32, 48, 64]. This vocabulary was not tuned.



A Appendix

Neural Information Processing Systems

A.1 T ensorFlow Primitives V ocabulary Name TF Function Argument Mapping Input 1 Input 2 Constant Dim Size ADD tf.math.add "Name" is the name of the operation in our search "TF Function" is the TensorFlow function that the name is mapped to when a DNA instruction "Argument Mapping" describes how the values in a DNA's argument set are mapped to the corresponding TensorFlow function arguments. TensorFlow graphs are built from DNA programs as described in Section 2 of the main text. The vocabulary for these relative dimensions is [1, 2, 4, 8, 12, 16, 24, 32, 48, 64]. This vocabulary was not tuned.



A Primer on Causal and Statistical Dataset Biases for Fair and Robust Image Analysis

Jones, Charles, Glocker, Ben

arXiv.org Machine Learning

Machine learning methods often fail when deployed in the real world. Worse still, they fail in high-stakes situations and across socially sensitive lines. These issues have a chilling effect on the adoption of machine learning methods in settings such as medical diagnosis, where they are arguably best-placed to provide benefits if safely deployed. In this primer, we introduce the causal and statistical structures which induce failure in machine learning methods for image analysis. We highlight two previously overlooked problems, which we call the \textit{no fair lunch} problem and the \textit{subgroup separability} problem. We elucidate why today's fair representation learning methods fail to adequately solve them and propose potential paths forward for the field.


Primer C-VAE: An interpretable deep learning primer design method to detect emerging virus variants

Wang, Hanyu, Tsinda, Emmanuel K., Dunn, Anthony J., Chikweto, Francis, Zemkoho, Alain B.

arXiv.org Artificial Intelligence

Motivation: PCR is more economical and quicker than Next Generation Sequencing for detecting target organisms, with primer design being a critical step. In epidemiology with rapidly mutating viruses, designing effective primers is challenging. Traditional methods require substantial manual intervention and struggle to ensure effective primer design across different strains. For organisms with large, similar genomes like Escherichia coli and Shigella flexneri, differentiating between species is also difficult but crucial. Results: We developed Primer C-VAE, a model based on a Variational Auto-Encoder framework with Convolutional Neural Networks to identify variants and generate specific primers. Using SARS-CoV-2, our model classified variants (alpha, beta, gamma, delta, omicron) with 98% accuracy and generated variant-specific primers. These primers appeared with >95% frequency in target variants and <5% in others, showing good performance in in-silico PCR tests. For Alpha, Delta, and Omicron, our primer pairs produced fragments <200 bp, suitable for qPCR detection. The model also generated effective primers for organisms with longer gene sequences like E. coli and S. flexneri. Conclusion: Primer C-VAE is an interpretable deep learning approach for developing specific primer pairs for target organisms. This flexible, semi-automated and reliable tool works regardless of sequence completeness and length, allowing for qPCR applications and can be applied to organisms with large and highly similar genomes.


A Primer on Large Language Models and their Limitations

Johnson, Sandra, Hyland-Wood, David

arXiv.org Artificial Intelligence

The world of artificial intelligence (AI) is increasingly penetrating all aspects of our personal and professional lives. This proliferation of AI tools and applications are being met with a mixture of excitement, scepticism and even dread [78]. Excitement at the seemingly endless potential of AI applications such as LLMs, especially when they are integrated "within broader systems" [13], scepticism as the realisation dawns that LLMs are in fact fallible as evidenced by hallucinations and hence not the golden bullet that can solve all problems [19, 21], and a feeling of dread for those who believe that LLMs and AI have the potential to detrimentally impact our lives and make people redundant [78]. The ability of some LLMs to pass Theory of Mind (ToM) [64][32] and Turing Tests [7][42] suggests support for the Computational Theory of Mind (CTM), that cognition may be substrate independent. These findings challenge biological essentialism and open new avenues for creating sophisticated AI systems capable of human-like reasoning and interaction.


Enhancing Post-Hoc Attributions in Long Document Comprehension via Coarse Grained Answer Decomposition

Ramu, Pritika, Goswami, Koustava, Saxena, Apoorv, Srinivasan, Balaji Vasan

arXiv.org Artificial Intelligence

Accurately attributing answer text to its source document is crucial for developing a reliable question-answering system. However, attribution for long documents remains largely unexplored. Post-hoc attribution systems are designed to map answer text back to the source document, yet the granularity of this mapping has not been addressed. Furthermore, a critical question arises: What exactly should be attributed? This involves identifying the specific information units within an answer that require grounding. In this paper, we propose and investigate a novel approach to the factual decomposition of generated answers for attribution, employing template-based in-context learning. To accomplish this, we utilize the question and integrate negative sampling during few-shot in-context learning for decomposition. This approach enhances the semantic understanding of both abstractive and extractive answers. We examine the impact of answer decomposition by providing a thorough examination of various attribution approaches, ranging from retrieval-based techniques to LLM-based attributors.


Searching for Efficient Transformers for Language Modeling

Neural Information Processing Systems

Large Transformer models have been central to recent advances in natural language processing. The training and inference costs of these models, however, have grown rapidly and become prohibitively expensive. Here we aim to reduce the costs of Transformers by searching for a more efficient variant. Compared to previous approaches, our search is performed at a lower level, over the primitives that define a Transformer TensorFlow program. We identify an architecture, named Primer, that has a smaller training cost than the original Transformer and other variants for auto-regressive language modeling.