Technical Perspective: Memory Efficiency via Offloading in Warehouse-Scale Datacenters

Communications of the ACM 

Large warehouse-scale computers (WSCs) underpin all the cloud computing services we use daily--whether it is Web search, video streaming, social networks, or even emerging AI chatbots or agents. The memory subsystem in these computers poses one of the biggest challenges in their design and operation: Across the industry, Big Tech companies such as Amazon, Google, Meta, and Microsoft spend billions of dollars buying memory and consume hundreds of megawatts powering them. Sadly, this problem is only getting worse, exacerbated by slowing of technology scaling trends (like Moore's law) and exploding demand for more data and correspondingly more memory--for example, artificial intelligence (AI) workloads. One approach to address the costs of memory is to use tiers. Most workloads have a working dataset that includes both hot (more frequently used) and cold (less frequently used) data.