A Hybrid Future for AI

Communications of the ACM 

Nvidia's rise to a 2-trillion valuation at the beginning of 2024 underscored the extraordinary computing demands of artificial intelligence systems that power ChatGPT and a host of other cloud services that create videos, music, and computer programs on demand. The power of computing and memory scaling has provided much of the impetus behind the surge in interest in generative AI based on large language models (LLMs). As models get bigger they seem to harness emergent behavior, making them more useful. But, as the growth in parameter counts has easily outstripped Moore's Law, such scaling comes at a high cost. Much of the concern around resource usage has been focused on the enormous arrays of graphics processing units (GPUs) and accelerators in training grids used to train models for weeks at a time.