Goto

Collaborating Authors

 exaflop


RWKV: Reinventing RNNs for the Transformer Era

Peng, Bo, Alcaide, Eric, Anthony, Quentin, Albalak, Alon, Arcadinho, Samuel, Biderman, Stella, Cao, Huanqi, Cheng, Xin, Chung, Michael, Grella, Matteo, GV, Kranthi Kiran, He, Xuzheng, Hou, Haowen, Lin, Jiaju, Kazienko, Przemyslaw, Kocon, Jan, Kong, Jiaming, Koptyra, Bartlomiej, Lau, Hayden, Mantri, Krishna Sri Ipsit, Mom, Ferdinand, Saito, Atsushi, Song, Guangyu, Tang, Xiangru, Wang, Bolun, Wind, Johan S., Wozniak, Stanislaw, Zhang, Ruichong, Zhang, Zhenyuan, Zhao, Qihang, Zhou, Peng, Zhou, Qinghua, Zhu, Jian, Zhu, Rui-Jie

arXiv.org Artificial Intelligence

Transformers have revolutionized almost all natural language processing (NLP) tasks but suffer from memory and computational complexity that scales quadratically with sequence length. In contrast, recurrent neural networks (RNNs) exhibit linear scaling in memory and computational requirements but struggle to match the same performance as Transformers due to limitations in parallelization and scalability. We propose a novel model architecture, Receptance Weighted Key Value (RWKV), that combines the efficient parallelizable training of transformers with the efficient inference of RNNs. Our approach leverages a linear attention mechanism and allows us to formulate the model as either a Transformer or an RNN, thus parallelizing computations during training and maintains constant computational and memory complexity during inference. We scale our models as large as 14 billion parameters, by far the largest dense RNN ever trained, and find RWKV performs on par with similarly sized Transformers, suggesting future work can leverage this architecture to create more efficient models. This work presents a significant step towards reconciling trade-offs between computational efficiency and model performance in sequence processing tasks.


Do VSR Models Generalize Beyond LRS3?

Djilali, Yasser Abdelaziz Dahou, Narayan, Sanath, Bihan, Eustache Le, Boussaid, Haithem, Almazrouei, Ebtessam, Debbah, Merouane

arXiv.org Artificial Intelligence

The Lip Reading Sentences-3 (LRS3) benchmark has primarily been the focus of intense research in visual speech recognition (VSR) during the last few years. As a result, there is an increased risk of overfitting to its excessively used test set, which is only one hour duration. To alleviate this issue, we build a new VSR test set named WildVSR, by closely following the LRS3 dataset creation processes. We then evaluate and analyse the extent to which the current VSR models generalize to the new test data. We evaluate a broad range of publicly available VSR models and find significant drops in performance on our test set, compared to their corresponding LRS3 results. Our results suggest that the increase in word error rates is caused by the models inability to generalize to slightly harder and in the wild lip sequences than those found in the LRS3 test set. Our new test benchmark is made public in order to enable future research towards more robust VSR models.


NVIDIA's next DGX supercomputer is all about generative AI

Engadget

NVIDIA CEO Jensen Hiang made a string of announcements during his Computex keynote, including details about the company's next DGX supercomputer. Given where the industry is clearly heading, it shouldn't come as a surprise that the DGX GH200 is largely about helping companies develop generative AI models. The supercomputer uses a new NVLink Switch System to enable 256 GH200 Grace Hopper superchips to act as a single GPU (each of the chips has an Arm-based Grace CPU and an H100 Tensor Core GPU). This, according to NVIDIA, allows the DGX GH200 to deliver 1 exaflop of performance and to have 144 terabytes of shared memory. The company says that's nearly 500 times as much memory as you'd find in a single DGX A100 system.


What Is an Exaflop?

#artificialintelligence

Computers are crunching more numbers than ever to crack the most complex problems of our time -- how to cure diseases like COVID and cancer, mitigate climate change and more. These and other grand challenges ushered computing into today's exascale era when top performance is often measured in exaflops. An exaflop is a measure of performance for a supercomputer that can calculate at least 1018 or one quintillion floating point operations per second. In exaflop, the exa- prefix means a quintillion, that's a billion billion, or one followed by 18 zeros. Similarly, an exabyte is a memory subsystem packing a quintillion bytes of data.


Global Big Data Conference

#artificialintelligence

Nvidia Corp. and Google LLC have won top spots in the MLPerf Training machine learning competition, the organization that hosts the competition detailed today. MLPerf Training is run by the MLCommons Association, an industry group that develops open-source AI tools. Participants in the competition test how quickly they can train a series of neural networks to perform various computing tasks. The goal is to complete the training process as fast as possible and in accordance with certain technical criteria set forth by the MLCommons Association. This year's competition consisted of eight tests.


Google Cloud's New TPU v4 ML Hub Packs 9 Exaflops of AI

#artificialintelligence

Almost exactly a year ago, Google launched its Tensor Processing Unit (TPU) v4 chips at Google I/O 2021, promising twice the performance compared to the TPU v3. At the time, Google CEO Sundar Pichai said that Google's datacenters would "soon have dozens of TPU v4 Pods, many of which will be operating at or near 90 percent carbon-free energy." Now, at Google I/O 2022, Pichai revealed the blue-ribbon fruit of those labors: a TPU v4-powered datacenter in Mayes County, Oklahoma, that Google says is the world's largest publicly available machine learning hub. "This machine learning hub has eight Cloud TPU v4 Pods, custom-built on the same networking infrastructure that powers Google's largest neural models," Pichai said. Google's TPU v4 Pods consist of 4,096 TPU v4 chips, each of which delivers 275 teraflops of ML-targeted bfloat16 ("brain floating point") performance.


Meta unveils new AI supercomputer destined to be world's fastest

#artificialintelligence

Meta has unveiled the AI Research SuperCluster (RSC), a new supercomputer that's among the fastest in the world. And it'll only get faster – by the end of the year it should rank number one, with computing power on the exascale. The company formerly known as Facebook has had its fingers in the AI pie for a few years now, and it's not hard to see why. Through Facebook, Instagram and Whatsapp et al, the conglomerate generates far more data than any mere mortal could possibly process – and there's obscene amounts of money to be made in sifting through it all. Meta's RSC will be up to the task, using this data and immense computing power to train AI algorithms to better recognize objects in images and spoken words in audio, quickly translate between languages, and identify harmful content and misinformation that shouldn't be on social media.


Meta says its new AI supercomputer will be the world's fastest by mid-2022

Engadget

Meta has completed the first phase of a new AI supercomputer. Once the AI Research SuperCluster (RSC) is fully built out later this year, the company believes it will be the fastest AI supercomputer on the planet, capable of "performing at nearly 5 exaflops of mixed precision compute." The company says RSC will help researchers develop better AI models that can learn from trillions of examples. Among other things, the models will be able to build better augmented reality tools and "seamlessly analyze text, images and video together," according to Meta. Much of this work is in service of its vision for the metaverse, in which it says AI-powered apps and products will have a key role. "We hope RSC will help us build entirely new AI systems that can, for example, power real-time voice translations to large groups of people, each speaking a different language, so they can seamlessly collaborate on a research project or play an AR game together," technical program manager Kevin Lee and software engineer Shubho Sengupta wrote in a blog post.


Advancing AI Capabilities with Next-Generation HPC Solutions

#artificialintelligence

HPE and NVIDIA are delivering IT solutions with superhuman intelligence to harness the full power of AI and pioneer the next generation of HPC systems. In this evolving digital economy, data is the cornerstone of success. Big Data is redefining the way we think, act, and understand the world, and accelerating insight is the difference between making the next major discovery and missing it. The more information we can effectively capture, analyze, and act on, the more opportunities there are to drive technological advancements, ensure economic control, strengthen national security, and fuel scientific research. Organizations across all sectors are putting this data to work.


Nvidia unveils massive AI processing chip Tesla V100

#artificialintelligence

Nvidia CEO Jen-Hsun Huang unveiled an ambitious new processor for artificial intelligence applications, the Tesla V100. The new chip has 21 billion transistors, and it is an order of magnitude more powerful than the 15-billion transistor Pascal-based processor that Nvidia announced a year ago. It is a huge chip -- 815 square millimeters, or about as big as an Apple Watch face. It has 5,120 CUDA processing cores, and it performs at 7.5 FP64 teraflops. The performance is about three times as fast as last year's product.