Goto

Collaborating Authors

AI Compute Symposium Charts Path from Emerging to Pervasive AI

#artificialintelligence

Together with the IEEE Circuits and Systems Society and Electron Device Society, IBM Research organized the 2nd AI Compute Symposium at the IBM T.J. Watson Research Center THINKLab in Yorktown Heights, N.Y., on Oct 17. More than 200 distinguished academics, renowned thinkers, students, and innovators from across industry and academia assembled for the one-day symposium, which showcased leadership and advancement in research addressing AI compute from pervasive to general AI. The free event featured three keynotes, three invited talks, a student poster session, and a panel discussion. The keynoters were Dr. Luis Lastras, a researcher with IBM; Professor Wen-mei Hwu of the University of Illinois at Urbana-Champaign (UIUC); and Harvard University/Samsung Fellow Donhee Ham. Lastras provided an exciting overview of research projects from IBM related to natural language processing and its evolution.


Deep Learning's Climate Change Problem

#artificialintelligence

The human brain is an incredibly efficient source of intelligence. Earlier this month, OpenAI announced it had built the biggest AI model in history. This astonishingly large model, known as GPT-3, is an impressive technical achievement. Yet it highlights a troubling and harmful trend in the field of artificial intelligence--one that has not gotten enough mainstream attention. Modern AI models consume a massive amount of energy, and these energy requirements are growing at a breathtaking rate.


Deep Learning's Climate Change Problem

#artificialintelligence

The human brain is an incredibly efficient source of intelligence. Earlier this month, OpenAI announced it had built the biggest AI model in history. This astonishingly large model, known as GPT-3, is an impressive technical achievement. Yet it highlights a troubling and harmful trend in the field of artificial intelligence--one that has not gotten enough mainstream attention. Modern AI models consume a massive amount of energy, and these energy requirements are growing at a breathtaking rate.


Artificial Intelligence Landscape -- 100 great articles and research papers

#artificialintelligence

Back in 2015 I had written an article on 100 Big Data papers to help demystify landscape. On the same lines I thought it would be good to do one for AI. The initial part is about the basics and provides some great links to strengthen your foundation. The latter part has links to some great research papers and is for advanced practitioners who want to understand the theory and details. AI is a revolution that is transforming how humans live and work.


On the Computational Inefficiency of Large Batch Sizes for Stochastic Gradient Descent

arXiv.org Machine Learning

Increasing the mini-batch size for stochastic gradient descent offers significant opportunities to reduce wall-clock training time, but there are a variety of theoretical and systems challenges that impede the widespread success of this technique. We investigate these issues, with an emphasis on time to convergence and total computational cost, through an extensive empirical analysis of network training across several architectures and problem domains, including image classification, image segmentation, and language modeling. Although it is common practice to increase the batch size in order to fully exploit available computational resources, we find a substantially more nuanced picture. Our main finding is that across a wide range of network architectures and problem domains, increasing the batch size beyond a certain point yields no decrease in wall-clock time to convergence for \emph{either} train or test loss. This batch size is usually substantially below the capacity of current systems. We show that popular training strategies for large batch size optimization begin to fail before we can populate all available compute resources, and we show that the point at which these methods break down depends more on attributes like model architecture and data complexity than it does directly on the size of the dataset.