Goto

Collaborating Authors

 merity


Efficient Language Modeling for Low-Resource Settings with Hybrid RNN-Transformer Architectures

Lindenmaier, Gabriel, Papay, Sean, Padó, Sebastian

arXiv.org Artificial Intelligence

Transformer-based language models have recently been at the forefront of active research in text generation. However, these models' advances come at the price of prohibitive training costs, with parameter counts in the billions and compute requirements measured in petaflop/s-decades. In this paper, we investigate transformer-based architectures for improving model performance in a low-data regime by selectively replacing attention layers with feed-forward and quasi-recurrent neural network layers. We test these architectures on the standard Enwik8 and Wikitext-103 corpora. Our results show that our reduced architectures outperform existing models with a comparable number of parameters, and obtain comparable performance to larger models while significantly reducing the number of parameters.


SHAQ: Single Headed Attention with Quasi-Recurrence

Bharwani, Nashwin, Kushner, Warren, Dandona, Sangeet, Schreiber, Ben

arXiv.org Artificial Intelligence

Natural Language Processing research has recently been dominated by large scale transformer models. Although they achieve state of the art on many important language tasks, transformers often require expensive compute resources, and days spanning to weeks to train. This is feasible for researchers at big tech companies and leading research universities, but not for scrappy start-up founders, students, and independent researchers. Stephen Merity's SHA-RNN, a compact, hybrid attention-RNN model, is designed for consumer-grade modeling as it requires significantly fewer parameters and less training time to reach near state of the art results. We analyze Merity's model here through an exploratory model analysis over several units of the architecture considering both training time and overall quality in our assessment. Ultimately, we combine these findings into a new architecture which we call SHAQ: Single Headed Attention Quasi-recurrent Neural Network. With our new architecture we achieved similar accuracy results as the SHA-RNN while accomplishing a 4x speed boost in training.


AI models beat humans at reading comprehension, but they've still got a ways to go

@machinelearnbot

When computer models designed by tech giants Alibaba and Microsoft this month surpassed humans for the first time in a reading-comprehension test, both companies celebrated the success as a historic milestone. Luo Si, the chief scientist for natural-language processing at Alibaba's AI research unit, struck a poetic note, saying, "Objective questions such as'what causes rain' can now be answered with high accuracy by machines." Teaching a computer to read has for decades been one of artificial intelligence's holiest grails, and the feat seemed to signal a coming future in which AI could understand words and process meaning with the same fluidity humans take for granted every day. But computers aren't there yet -- and aren't even really that close, said AI experts who reviewed the test results. Instead, the accomplishment highlights not just how far the technology has progressed, but also how far it still has to go. "It's a large step" for the companies' marketing "but a small step for humankind," said Oren Etzioni, chief executive of the Allen Institute for Artificial Intelligence, an AI research group funded by Microsoft co-founder Paul Allen.