Large Language Model
Democratizing LLMs Needs a Revolution in AI Hardware
Andrew Feldman is CEO of Cerebras, a startup that specializes in AI hardware. There is growing concern that artificial intelligence (AI)--namely deep learning--is becoming centralized within a few very wealthy companies. This shift does not apply to all areas of AI, but it is certainly the case for large language models (LLMs). Accordingly, there has been growing interest in democratizing LLMs and making them available to a broader audience. However, while there have been impressive initiatives in open-sourcing models, the hardware barriers of LLMs have gone mostly unaddressed.
How Would You Apply GPT to Images?
Note: There are different angles to answer an interview question. The author of this newsletter does not try to find a reference that answers a question exhaustively. Rather, the author would like to share some quick insights and help the readers to think, practice and do further researches as necessary. You can also find the original post here, and follow me on Linkedin!
Lost in Space Marking
Jacobs, Cassandra L., Pinter, Yuval
Such a claim requires empirical support, but consideration Modern NLP is dominated by large pre-trained of common practice can also be offered models, systems which are large, complex, and to challenge it: for one, pre-tokenization such as costly to train. As a result, much research effort is punctuation separation and accent normalization is put into questions of tuning and configuring the not always applied consistently when moving on to various layers and training regimes for improving a downstream text. A model that was trained on untreated prediction quality on a growing number of text may find it difficult to process an NER tasks (Rogers et al., 2020). Unfortunately, not as dataset (for example) where punctuation is separated much research asks questions about the decisions from preceding words, rendering a word-finalmarking made at the most upstream parts of the models, tokenizer more robust to change; some tokenizers those that deal with input tokenization and subword like BERT's Wordpiece (Devlin et al., 2019) vocabulary creation. "mark" a class of tokens by omission, i.e. marking In this exploratory work, we isolate a single decision the non-initial pieces rather than initial ones. This point which appears to be resolved arbitrarily discrepancy surfaces edge case effects when compared by existing model developers, with no consensus with a seemingly-equivalent tokenizer like but also no underlying theory: should subword GPT-2's (Radford et al., 2019), which marks initial tokenizers mark word boundaries at the pieces but only if they are prepended by a space beginning or the end?
Learning from flowsheets: A generative transformer model for autocompletion of flowsheets
Vogel, Gabriel, Balhorn, Lukas Schulze, Schweidtmann, Artur M.
We propose a novel method enabling autocompletion of chemical flowsheets. This idea is inspired by the autocompletion of text. We represent flowsheets as strings using the text-based SFILES 2.0 notation and learn the grammatical structure of the SFILES 2.0 language and common patterns in flowsheets using a transformer-based language model. We pre-train our model on synthetically generated flowsheets to learn the flowsheet language grammar. Then, we fine-tune our model in a transfer learning step on real flowsheet topologies. Finally, we use the trained model for causal language modeling to autocomplete flowsheets. Eventually, the proposed method can provide chemical engineers with recommendations during interactive flowsheet synthesis. The results demonstrate a high potential of this approach for future AI-assisted process synthesis.
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
Du, Nan, Huang, Yanping, Dai, Andrew M., Tong, Simon, Lepikhin, Dmitry, Xu, Yuanzhong, Krikun, Maxim, Zhou, Yanqi, Yu, Adams Wei, Firat, Orhan, Zoph, Barret, Fedus, Liam, Bosma, Maarten, Zhou, Zongwei, Wang, Tao, Wang, Yu Emma, Webster, Kellie, Pellat, Marie, Robinson, Kevin, Meier-Hellstern, Kathleen, Duke, Toju, Dixon, Lucas, Zhang, Kun, Le, Quoc V, Wu, Yonghui, Chen, Zhifeng, Cui, Claire
Scaling language models with more data, compute and parameters has driven significant progress in natural language processing. For example, thanks to scaling, GPT-3 was able to achieve strong results on in-context learning tasks. However, training these large dense models requires significant amounts of computing resources. In this paper, we propose and develop a family of language models named GLaM (Generalist Language Model), which uses a sparsely activated mixture-of-experts architecture to scale the model capacity while also incurring substantially less training cost compared to dense variants. The largest GLaM has 1.2 trillion parameters, which is approximately 7x larger than GPT-3. It consumes only 1/3 of the energy used to train GPT-3 and requires half of the computation flops for inference, while still achieving better overall zero-shot and one-shot performance across 29 NLP tasks.
Large language models can't plan, even if they write fancy essays
This article is part of our coverage of the latest in AI research. Large language models like GPT-3 have advanced to the point that it has become difficult to measure the limits of their capabilities. When you have a very large neural network that can generate articles, write software code, and engage in conversations about sentience and life, you should expect it to be able to reason about tasks and plan as a human does, right? A study by researchers at Arizona State University, Tempe, shows that when it comes to planning and thinking methodically, LLMs perform very poorly, and suffer from many of the same failures observed in current deep learning systems. Interestingly, the study finds that, while very large LLMs like GPT-3 and PaLM pass many of the tests that were meant to evaluate the reasoning capabilities and artificial intelligence systems, they do so because these benchmarks are either too simplistic or too flawed and can be "cheated" through statistical tricks, something that deep learning systems are very good at.
AlphaFold reveals the structure of the protein universe
To read about all our work on solving protein folding, go to deepmind.com/AlphaFold It's been one year since we released and open sourced AlphaFold, our AI system to predict the 3D structure of a protein just from its 1D amino acid sequence, and created the AlphaFold Protein Structure Database (AlphaFold DB) to freely share this scientific knowledge with the world. Proteins are the building blocks of life, they underpin every biological process in every living thing. And, because a protein's shape is closely linked with its function, knowing a protein's structure unlocks a greater understanding of what it does and how it works. We hoped this groundbreaking resource would help accelerate scientific research and discovery globally, and that other teams could learn from and build on the advances we made with AlphaFold to create further breakthroughs.
'The entire protein universe': AI predicts shape of nearly every known protein
The structure of the vitellogenin protein -- a precursor of egg yolk -- as predicted by the AlphaFold tool.Credit: DeepMind From today, determining the 3D shape of almost any protein known to science will be as simple as typing in a Google search. Researchers have used AlphaFold -- the revolutionary artificial-intelligence (AI) network -- to predict the structures of some 200 million proteins from 1 million species, covering nearly every known protein on the planet. The data dump will be freely available on a database set up by DeepMind, Google's London-based AI company that developed AlphaFold, and the European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI), an intergovernmental organization near Cambridge, UK. "Essentially you can think of it covering the entire protein universe," DeepMind CEO Demis Hassabis, said at a press briefing. The 3D shape, or structure, of a protein is what determines its function in cells.
How We Accidentally Gave our Bots Their Personalities
A couple months ago we noticed that the process of optimizing our computer models to evaluate text can produce pretty cool personalities for different bots so we figured we'd share what we've learned so far. We hope these bots can help with some of the challenges we are facing with getting persistent state out of natural language generation. We hope writing about how we developed them can provide some tips for our users who are helping us create new bots. So what do we mean by personalities? Here's a few examples of the intermediate step and the desired output (a score) that we generated when we played a game where we told some of our earlier bots that we were writing this blog post.