AITopics | merity

Collaborating Authors

merity

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Efficient Language Modeling for Low-Resource Settings with Hybrid RNN-Transformer Architectures

Lindenmaier, Gabriel, Papay, Sean, Padó, Sebastian

arXiv.org Artificial IntelligenceFeb-1-2025

Transformer-based language models have recently been at the forefront of active research in text generation. However, these models' advances come at the price of prohibitive training costs, with parameter counts in the billions and compute requirements measured in petaflop/s-decades. In this paper, we investigate transformer-based architectures for improving model performance in a low-data regime by selectively replacing attention layers with feed-forward and quasi-recurrent neural network layers. We test these architectures on the standard Enwik8 and Wikitext-103 corpora. Our results show that our reduced architectures outperform existing models with a comparable number of parameters, and obtain comparable performance to larger models while significantly reducing the number of parameters.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2502.00617

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Italy > Tuscany > Florence (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(9 more...)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SHAQ: Single Headed Attention with Quasi-Recurrence

Bharwani, Nashwin, Kushner, Warren, Dandona, Sangeet, Schreiber, Ben

arXiv.org Artificial IntelligenceAug-18-2021

Natural Language Processing research has recently been dominated by large scale transformer models. Although they achieve state of the art on many important language tasks, transformers often require expensive compute resources, and days spanning to weeks to train. This is feasible for researchers at big tech companies and leading research universities, but not for scrappy start-up founders, students, and independent researchers. Stephen Merity's SHA-RNN, a compact, hybrid attention-RNN model, is designed for consumer-grade modeling as it requires significantly fewer parameters and less training time to reach near state of the art results. We analyze Merity's model here through an exploratory model analysis over several units of the architecture considering both training time and overall quality in our assessment. Ultimately, we combine these findings into a new architecture which we call SHAQ: Single Headed Attention Quasi-recurrent Neural Network. With our new architecture we achieved similar accuracy results as the SHA-RNN while accomplishing a 4x speed boost in training.

architecture, baseline, merity, (17 more...)

arXiv.org Artificial Intelligence

2108.08207

Country: South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

AI models beat humans at reading comprehension, but they've still got a ways to go

@machinelearnbotJan-17-2018, 02:33:15 GMT

When computer models designed by tech giants Alibaba and Microsoft this month surpassed humans for the first time in a reading-comprehension test, both companies celebrated the success as a historic milestone. Luo Si, the chief scientist for natural-language processing at Alibaba's AI research unit, struck a poetic note, saying, "Objective questions such as'what causes rain' can now be answered with high accuracy by machines." Teaching a computer to read has for decades been one of artificial intelligence's holiest grails, and the feat seemed to signal a coming future in which AI could understand words and process meaning with the same fluidity humans take for granted every day. But computers aren't there yet -- and aren't even really that close, said AI experts who reviewed the test results. Instead, the accomplishment highlights not just how far the technology has progressed, but also how far it still has to go. "It's a large step" for the companies' marketing "but a small step for humankind," said Oren Etzioni, chief executive of the Allen Institute for Artificial Intelligence, an AI research group funded by Microsoft co-founder Paul Allen.

artificial intelligence, natural language, reading comprehension, (13 more...)

@machinelearnbot

AI-Alerts: 2018 > 2018-01 > AAAI AI-Alert for Jan 23, 2018 (1.00)

Country:

North America > United States > New York (0.05)
North America > United States > Florida > Duval County > Jacksonville (0.05)
Asia > Singapore (0.05)

Industry: Education > Assessment & Standards > Student Performance (0.62)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback