Goto

Collaborating Authors

 processor


Short-Dot: Computing Large Linear Transforms Distributedly Using Coded Short Dot Products

Neural Information Processing Systems

Faced with saturation of Moore's law and increasing size and dimension of data, system designers have increasingly resorted to parallel and distributed computing to reduce computation time of machine-learning algorithms. However, distributed computing is often bottle necked by a small fraction of slow processors called stragglers that reduce the speed of computation because the fusion node has to wait for all processors to complete their processing. To combat the effect of stragglers, recent literature proposes introducing redundancy in computations across processors, e.g., using repetition-based strategies or erasure codes. The fusion node can exploit this redundancy by completing the computation using outputs from only a subset of the processors, ignoring the stragglers. In this paper, we propose a novel technique - that we call Short-Dot - to introduce redundant computations in a coding theory inspired fashion, for computing linear transforms of long vectors. Instead of computing long dot products as required in the original linear transform, we construct a larger number of redundant and short dot products that can be computed more efficiently at individual processors. Further, only a subset of these short dot products are required at the fusion node to finish the computation successfully. We demonstrate through probabilistic analysis as well as experiments on computing clusters that Short-Dot offers significant speed-up compared to existing techniques. We also derive trade-offs between the length of the dot-products and the resilience to stragglers (number of processors required to finish), for any such strategy and compare it to that achieved by our strategy.


Could AI Data Centers Be Moved to Outer Space?

WIRED

Could AI Data Centers Be Moved to Outer Space? Massive data centers for generative AI are bad for the Earth. Data centers are being built at a frantic pace all over the world, driven by the AI boom. These facilities consume staggering amounts of electricity. By 2028, AI servers alone may use as much energy as 22 percent of US households.



Mesh-TensorFlow: Deep Learning for Supercomputers

Noam Shazeer, Youlong Cheng, Niki Parmar, Dustin Tran, Ashish Vaswani, Penporn Koanantakool, Peter Hawkins, HyoukJoong Lee, Mingsheng Hong, Cliff Young, Ryan Sepassi, Blake Hechtman

Neural Information Processing Systems

However,batch-splitting suffers from problems including the inability to train very large models (due to memory constraints), high latency, and inefficiency at small batch sizes. All of these can be solved by more general distribution strategies (model-parallelism). Unfortunately,efficient model-parallel algorithms tend tobe complicated todiscover, describe, and to implement, particularly on large clusters.





Alliwava GH8 review: Ryzen 9 muscle in a shockingly small PC

PCWorld

When you purchase through links in our articles, we may earn a small commission. The Alliwava GH8 is a good example of how much performance is possible today in the smallest of spaces. The Alliwava GH8 is a good example of how much performance is possible today in the smallest of spaces. With the Ryzen 9 8945HS, it not only offers powerful CPU performance, but also added value for AI applications thanks to the improved NPU. In doing so, it leaves many competitors behind in terms of connectivity and cooling management.


I tested Panther Lake. You're going to want this

PCWorld

PCWorld tested Intel's new Panther Lake Core Ultra X9 388H processor, which delivers gaming laptop performance through integrated graphics comparable to Nvidia GeForce 4050 chips. The chip achieves impressive battery life up to 27 hours while maintaining strong performance, with AI frame generation boosting gaming from 52 to 92 fps in titles like Cyberpunk. Panther Lake faces competition from AMD's Ryzen AI Max and Qualcomm's Snapdragon X2, but Intel's early 2025 release provides significant market advantage.


US approves sale of Nvidia's advanced AI chips to China

BBC News

US approves sale of Nvidia's advanced AI chips to China The US government has given chip giant Nvidia the green light to sell its advanced artificial intelligence (AI) processors in China, the Department of Commerce said on Tuesday. The H200, Nvidia's second-most-advanced semiconductor, had been restricted by Washington over concerns that it would give China's technology industry and military an edge over the US. The Commerce Department said the chips can be shipped to China granted that there is sufficient supply of the processors in the US. President Donald Trump said last month that he would allow the chip sales to approved customers in China and collect a 25% fee. Nvidia's spokesperson told the BBC that the company welcomed the move, saying it will benefit manufacturing and jobs in the US.