AITopics

2106.0327

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Illinois (0.04)
Asia > Taiwan (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.50)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)

#artificialintelligenceJun-4-2021, 17:25:14 GMT

GPT3

GPT3 is an advanced language model. Have hands on experience with GPT3 and learn what is GPT-3 in this article in depth here

gpt-3, gpt3, output layer, (12 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

#artificialintelligenceJun-4-2021, 04:55:17 GMT

China's GPT-3? BAAI Introduces Superscale Intelligence Model 'Wu Dao 1.0'

Since the May 2020 release of OpenAI's GPT-3, AI researchers have embraced super-large-scale pretraining models. Packing an epoch-making 175 billion parameters, GPT-3 has achieved excellent performance across multiple natural language processing (NLP) tasks. Despite their size and power however, such models still lack common sense or cognitive abilities, and so struggle with complex reasoning tasks like open dialogue, knowledge-based Q&A, visual reasoning, etc. In a bid to promote the research and development of China's own large-scale pretraining models and further explore universal intelligence from a more fundamental perspective, the Beijing Academy of Artificial Intelligence (BAAI) recently unveiled Wu Dao 1.0, China's first homegrown super-scale intelligent model system. The work was led by BAAI Research Academic Vice President and Tsinghua University Professor Tang Jie, with contributions from a team of more than 100 AI scientists from Peking University, Tsinghua University, Renmin University of China, Chinese Academy of Sciences and other institutes.

baai introduce superscale intelligence model, china, wu dao 1, (11 more...)

Country: Asia > China > Beijing > Beijing (0.25)

Industry: Health & Medicine > Therapeutic Area (0.59)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceJun-4-2021

Layered gradient accumulation and modular pipeline parallelism: fast and efficient training of large language models

Lamy-Poirier, Joel

The advent of the transformer has sparked a quick growth in the size of language models, far outpacing hardware improvements. (Dense) transformers are expected to reach the trillion-parameter scale in the near future, for which training requires thousands or even tens of thousands of GPUs. We investigate the challenges of training at this scale and beyond on commercially available hardware. In particular, we analyse the shortest possible training time for different configurations of distributed training, leveraging empirical scaling laws for language models to estimate the optimal (critical) batch size. Contrary to popular belief, we find no evidence for a memory wall, and instead argue that the real limitation -- other than the cost -- lies in the training duration. In addition to this analysis, we introduce two new methods, \textit{layered gradient accumulation} and \textit{modular pipeline parallelism}, which together cut the shortest training time by half. The methods also reduce data movement, lowering the network requirement to a point where a fast InfiniBand connection is not necessary. This increased network efficiency also improve on the methods introduced with the ZeRO optimizer, reducing the memory usage to a tiny fraction of the available GPU memory.

gradient accumulation, parallelism, pipeline parallelism, (14 more...)

2106.02679

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.83)

#artificialintelligenceJun-3-2021, 16:40:46 GMT

Researchers open-source benchmarks measuring quality of AI-generated code

The applications of computer programming are vast in scope. And as computers become ubiquitous, the demand for quality code draws an ever-growing number of aspiring programmers to the profession. After years of study to become proficient at coding, experts learn to convert abstracts into concrete, executable programs. But what if AI could do the same? In recent years, large-scale AI language models have shown promise in generalizing to tasks including writing code, implying that humans' work may be one day supplemented by AI systems.

code generation, researcher open-source benchmark, specification, (15 more...)

Country:

North America > United States > Illinois > Cook County > Chicago (0.05)
North America > United States > California (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.38)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.38)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.38)

#artificialintelligenceJun-3-2021, 04:30:15 GMT

Microsoft, GPT-3, and the future of OpenAI

One of the biggest highlights of Build, Microsoft's annual software development conference, was the presentation of a tool that uses deep learning to generate source code for office applications. The tool uses GPT-3, a massive language model developed by OpenAI last year and made available to select developers, researchers, and startups in a paid application programming interface. Many have touted GPT-3 as the next-generation artificial intelligence technology that will usher in a new breed of applications and startups. Since GPT-3's release, many developers have found interesting and innovative uses for the language model. And several startups have declared that they will be using GPT-3 to build new or augment existing products. But creating a profitable and sustainable business around GPT-3 remains a challenge.

gpt-3, microsoft, openai, (16 more...)

Industry: Information Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.73)

Wennberg, Ulme, Henter, Gustav Eje

The Case for Translation-Invariant Self-Attention in Transformer-Based Language Models

arXiv.org Artificial IntelligenceJun-3-2021

Mechanisms for encoding positional information are central for transformer-based language models. In this paper, we analyze the position embeddings of existing language models, finding strong evidence of translation invariance, both for the embeddings themselves and for their effect on self-attention. The degree of translation invariance increases during training and correlates positively with model performance. Our findings lead us to propose translation-invariant self-attention (TISA), which accounts for the relative position between tokens in an interpretable fashion without needing conventional position embeddings. Our proposal has several theoretical advantages over existing position-representation approaches. Experiments show that it improves on regular ALBERT on GLUE tasks, while only adding orders of magnitude less positional parameters.

attention head, language model, proc, (14 more...)

2106.0195

Country: Europe > Sweden (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.61)

#artificialintelligenceJun-2-2021, 21:36:34 GMT

Thoughts on the Alignment Implications of Scaling Language Models

By now, most of you have probably heard about GPT-3 and what it does. There's been a bunch of different opinions on what it means for alignment, and this post is yet another opinion from a slightly different perspective. Some background: I'm a part of EleutherAI, a decentralized research collective (read: glorified discord server - come join us on Discord for ML, alignment, and dank memes). We're best known for our ongoing effort to create a GPT-3-like large language model, and so we have a lot of experience working with transformer models and looking at scaling laws, but we also take alignment very seriously and spend a lot of time thinking about it. I also want to lay out some potential topics for future research that might be fruitful. By the way, I did consider that the scaling laws implications might be an infohazard, but I think that ship sailed the moment the GPT-3 paper went live, and since we've already been in a race for parameters for some time (see: Megatron-LM, Turing-NLG, Switch Transformer, PanGu-α/盘古α, HyperCLOVA, Wudao/悟道 2.0, among others), I don't really think this post is causing any non-negligible amount of desire for scaling.

abstraction, alignment, resolution, (14 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Saha, Swarnadeep, Yadav, Prateek, Bansal, Mohit

multiPRover: Generating Multiple Proofs for Improved Interpretability in Rule Reasoning

arXiv.org Artificial IntelligenceJun-2-2021

We focus on a type of linguistic formal reasoning where the goal is to reason over explicit knowledge in the form of natural language facts and rules (Clark et al., 2020). A recent work, named PRover (Saha et al., 2020), performs such reasoning by answering a question and also generating a proof graph that explains the answer. However, compositional reasoning is not always unique and there may be multiple ways of reaching the correct answer. Thus, in our work, we address a new and challenging problem of generating multiple proof graphs for reasoning over natural language rule-bases. Each proof provides a different rationale for the answer, thereby improving the interpretability of such reasoning systems. In order to jointly learn from all proof graphs and exploit the correlations between multiple proofs for a question, we pose this task as a set generation problem over structured output spaces where each proof is represented as a directed graph. We propose two variants of a proof-set generation model, multiPRover. Our first model, Multilabel-multiPRover, generates a set of proofs via multi-label classification and implicit conditioning between the proofs; while the second model, Iterative-multiPRover, generates proofs iteratively by explicitly conditioning on the previously generated proofs. Experiments on multiple synthetic, zero-shot, and human-paraphrased datasets reveal that both multiPRover models significantly outperform PRover on datasets containing multiple gold proofs. Iterative-multiPRover obtains state-of-the-art proof F1 in zero-shot scenarios where all examples have single correct proofs. It also generalizes better to questions requiring higher depths of reasoning where multiple proofs are more frequent. Our code and models are publicly available at https://github.com/swarnaHub/multiPRover

dataset, it-multi pr, multi pr, (17 more...)

2106.01354

Country:

Asia > China > Hong Kong (0.04)
North America > United States > California (0.04)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.55)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.49)
(2 more...)

arXiv.org Artificial IntelligenceJun-2-2021

Joint Retrieval and Generation Training for Grounded Text Generation

Zhang, Yizhe, Sun, Siqi, Gao, Xiang, Fang, Yuwei, Brockett, Chris, Galley, Michel, Gao, Jianfeng, Dolan, Bill

Recent advances in large-scale pre-training such as GPT-3 allow seemingly high quality text to be generated from a given prompt. However, such generation systems often suffer from problems of hallucinated facts, and are not inherently designed to incorporate useful external information. Grounded generation models appear to offer remedies, but their training typically relies on rarely-available parallel data where corresponding information-relevant documents are provided for context. We propose a framework that alleviates this data constraint by jointly training a grounded generator and document retriever on the language model signal. The model learns to reward retrieval of the documents with the highest utility in generation, and attentively combines them using a Mixture-of-Experts (MoE) ensemble to generate follow-on text. We demonstrate that both generator and retriever can take advantage of this joint training and work synergistically to produce more informative and relevant text in both prose and dialogue generation.

aaab6hicbvbns8naej3ur1q qh69lbbbu0lesmecf48t2a9oq9lsj 3azsbsbsq gu8efdeqz, latexit latexit sha1, latexit sha1, (17 more...)

2105.06597

Country:

South America > Chile (0.04)
North America > United States > Washington > King County > Redmond (0.04)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry:

Media > Film (1.00)
Leisure & Entertainment > Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)