AITopics | Large Language Model

Collaborating Authors

Large Language Model

News Overviews Instructional Materials AI-Alerts Classics

Democratizing LLMs Needs a Revolution in AI Hardware

#artificialintelligenceAug-2-2022, 15:19:38 GMT

Andrew Feldman is CEO of Cerebras, a startup that specializes in AI hardware. There is growing concern that artificial intelligence (AI)--namely deep learning--is becoming centralized within a few very wealthy companies. This shift does not apply to all areas of AI, but it is certainly the case for large language models (LLMs). Accordingly, there has been growing interest in democratizing LLMs and making them available to a broader audience. However, while there have been impressive initiatives in open-sourcing models, the hardware barriers of LLMs have gone mostly unaddressed.

ai hardware, democratizing llm, revolution, (1 more...)

#artificialintelligence

Genre: Research Report > Promising Solution (0.40)

Industry: Information Technology > Hardware (0.66)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Comparing the Top 10 AI Content Generators: Which is Best?

#artificialintelligenceAug-2-2022, 11:45:16 GMT

Tools for creating AI content are becoming more broadly accessible as a result of the development of GPT-3 and its subsequent release through Open.ai. Are machines assuming control of the content market? Read on to find out.

ai content generator

#artificialintelligence

Industry: Information Technology > Services > e-Commerce Services (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.49)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

How Would You Apply GPT to Images?

#artificialintelligenceAug-2-2022, 10:00:08 GMT

Note: There are different angles to answer an interview question. The author of this newsletter does not try to find a reference that answers a question exhaustively. Rather, the author would like to share some quick insights and help the readers to think, practice and do further researches as necessary. You can also find the original post here, and follow me on Linkedin!

mark chen, stanford seminar, transformer, (4 more...)

#artificialintelligence

Technology:

Information Technology > Communications > Social Media (0.68)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.42)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.42)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.42)

Add feedback

Lost in Space Marking

Jacobs, Cassandra L., Pinter, Yuval

arXiv.org Artificial IntelligenceAug-2-2022

Such a claim requires empirical support, but consideration Modern NLP is dominated by large pre-trained of common practice can also be offered models, systems which are large, complex, and to challenge it: for one, pre-tokenization such as costly to train. As a result, much research effort is punctuation separation and accent normalization is put into questions of tuning and configuring the not always applied consistently when moving on to various layers and training regimes for improving a downstream text. A model that was trained on untreated prediction quality on a growing number of text may find it difficult to process an NER tasks (Rogers et al., 2020). Unfortunately, not as dataset (for example) where punctuation is separated much research asks questions about the decisions from preceding words, rendering a word-finalmarking made at the most upstream parts of the models, tokenizer more robust to change; some tokenizers those that deal with input tokenization and subword like BERT's Wordpiece (Devlin et al., 2019) vocabulary creation. "mark" a class of tokens by omission, i.e. marking In this exploratory work, we isolate a single decision the non-initial pieces rather than initial ones. This point which appears to be resolved arbitrarily discrepancy surfaces edge case effects when compared by existing model developers, with no consensus with a seemingly-equivalent tokenizer like but also no underlying theory: should subword GPT-2's (Radford et al., 2019), which marks initial tokenizers mark word boundaries at the pieces but only if they are prepended by a space beginning or the end?

computational linguistic, morpheme, tokenizer, (13 more...)

arXiv.org Artificial Intelligence

2208.01561

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > New York > Erie County > Buffalo (0.04)
(5 more...)

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.49)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Learning from flowsheets: A generative transformer model for autocompletion of flowsheets

Vogel, Gabriel, Balhorn, Lukas Schulze, Schweidtmann, Artur M.

arXiv.org Artificial IntelligenceAug-1-2022

We propose a novel method enabling autocompletion of chemical flowsheets. This idea is inspired by the autocompletion of text. We represent flowsheets as strings using the text-based SFILES 2.0 notation and learn the grammatical structure of the SFILES 2.0 language and common patterns in flowsheets using a transformer-based language model. We pre-train our model on synthetically generated flowsheets to learn the flowsheet language grammar. Then, we fine-tune our model in a transfer learning step on real flowsheet topologies. Finally, we use the trained model for causal language modeling to autocomplete flowsheets. Eventually, the proposed method can provide chemical engineers with recommendations during interactive flowsheet synthesis. The results demonstrate a high potential of this approach for future AI-assisted process synthesis.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2208.00859

Country:

Europe > Netherlands > South Holland > Delft (0.05)
Europe > Denmark (0.04)

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.93)
Energy (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

GLaM: Efficient Scaling of Language Models with Mixture-of-Experts

Du, Nan, Huang, Yanping, Dai, Andrew M., Tong, Simon, Lepikhin, Dmitry, Xu, Yuanzhong, Krikun, Maxim, Zhou, Yanqi, Yu, Adams Wei, Firat, Orhan, Zoph, Barret, Fedus, Liam, Bosma, Maarten, Zhou, Zongwei, Wang, Tao, Wang, Yu Emma, Webster, Kellie, Pellat, Marie, Robinson, Kevin, Meier-Hellstern, Kathleen, Duke, Toju, Dixon, Lucas, Zhang, Kun, Le, Quoc V, Wu, Yonghui, Chen, Zhifeng, Cui, Claire

arXiv.org Artificial IntelligenceAug-1-2022

Scaling language models with more data, compute and parameters has driven significant progress in natural language processing. For example, thanks to scaling, GPT-3 was able to achieve strong results on in-context learning tasks. However, training these large dense models requires significant amounts of computing resources. In this paper, we propose and develop a family of language models named GLaM (Generalist Language Model), which uses a sparsely activated mixture-of-experts architecture to scale the model capacity while also incurring substantially less training cost compared to dense variants. The largest GLaM has 1.2 trillion parameters, which is approximately 7x larger than GPT-3. It consumes only 1/3 of the energy used to train GPT-3 and requires half of the computation flops for inference, while still achieving better overall zero-shot and one-shot performance across 29 NLP tasks.

aclanthology, computational linguistic, glam, (14 more...)

arXiv.org Artificial Intelligence

2112.06905

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
North America > United States > New York > New York County > New York City (0.04)
(20 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Large language models can't plan, even if they write fancy essays

#artificialintelligenceJul-31-2022, 23:25:18 GMT

This article is part of our coverage of the latest in AI research. Large language models like GPT-3 have advanced to the point that it has become difficult to measure the limits of their capabilities. When you have a very large neural network that can generate articles, write software code, and engage in conversations about sentience and life, you should expect it to be able to reason about tasks and plan as a human does, right? A study by researchers at Arizona State University, Tempe, shows that when it comes to planning and thinking methodically, LLMs perform very poorly, and suffer from many of the same failures observed in current deep learning systems. Interestingly, the study finds that, while very large LLMs like GPT-3 and PaLM pass many of the tests that were meant to evaluate the reasoning capabilities and artificial intelligence systems, they do so because these benchmarks are either too simplistic or too flawed and can be "cheated" through statistical tricks, something that deep learning systems are very good at.

benchmark, kambhampati, reasoning, (15 more...)

#artificialintelligence

Country: North America > United States > Arizona (0.26)

Genre: Research Report (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

AlphaFold reveals the structure of the protein universe

#artificialintelligenceJul-31-2022, 07:45:24 GMT

To read about all our work on solving protein folding, go to deepmind.com/AlphaFold It's been one year since we released and open sourced AlphaFold, our AI system to predict the 3D structure of a protein just from its 1D amino acid sequence, and created the AlphaFold Protein Structure Database (AlphaFold DB) to freely share this scientific knowledge with the world. Proteins are the building blocks of life, they underpin every biological process in every living thing. And, because a protein's shape is closely linked with its function, knowing a protein's structure unlocks a greater understanding of what it does and how it works. We hoped this groundbreaking resource would help accelerate scientific research and discovery globally, and that other teams could learn from and build on the advances we made with AlphaFold to create further breakthroughs.

alphafold, biology, protein, (14 more...)

#artificialintelligence

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.96)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.61)

Add feedback

'The entire protein universe': AI predicts shape of nearly every known protein

#artificialintelligenceJul-31-2022, 06:00:48 GMT

The structure of the vitellogenin protein -- a precursor of egg yolk -- as predicted by the AlphaFold tool.Credit: DeepMind From today, determining the 3D shape of almost any protein known to science will be as simple as typing in a Google search. Researchers have used AlphaFold -- the revolutionary artificial-intelligence (AI) network -- to predict the structures of some 200 million proteins from 1 million species, covering nearly every known protein on the planet. The data dump will be freely available on a database set up by DeepMind, Google's London-based AI company that developed AlphaFold, and the European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI), an intergovernmental organization near Cambridge, UK. "Essentially you can think of it covering the entire protein universe," DeepMind CEO Demis Hassabis, said at a press briefing. The 3D shape, or structure, of a protein is what determines its function in cells.

database, prediction, protein, (13 more...)

#artificialintelligence

AI-Alerts: 2022 > 2022-08 > AAAI AI-Alert for Aug 2, 2022 (1.00)

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.25)
Europe > Germany (0.05)
Asia > South Korea > Seoul > Seoul (0.05)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.51)
Information Technology > Services (0.36)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.80)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)

Add feedback

How We Accidentally Gave our Bots Their Personalities

#artificialintelligenceJul-31-2022, 04:21:01 GMT

A couple months ago we noticed that the process of optimizing our computer models to evaluate text can produce pretty cool personalities for different bots so we figured we'd share what we've learned so far. We hope these bots can help with some of the challenges we are facing with getting persistent state out of natural language generation. We hope writing about how we developed them can provide some tips for our users who are helping us create new bots. So what do we mean by personalities? Here's a few examples of the intermediate step and the desired output (a score) that we generated when we played a game where we told some of our earlier bots that we were writing this blog post.

bot, commentary, evaluation bot, (13 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language > Generation (0.55)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.37)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.37)

Add feedback