amdahl
Matmul or No Matmul in the Era of 1-bit LLMs
Malekar, Jinendra, Elbtity, Mohammed E., Zand, Ramtin
The advent of 1-bit large language models (LLMs) has attracted considerable attention and opened up new research opportunities. However, 1-bit LLMs only improve a fraction of models by applying extreme quantization to the projection layers while leaving attention heads unchanged. Therefore, to avoid fundamentally wrong choices of goals in future research, it is crucial to understand the actual improvements in computation and memory usage that 1-bit LLMs can deliver. In this work, we present an adaptation of Amdahl's Law tailored for the 1-bit LLM context, which illustrates how partial improvements in 1-bit LLMs impact overall model performance. Through extensive experiments, we uncover key nuances across different model architectures and hardware configurations, offering a roadmap for future research in the era of 1-bit LLMs.
On Extending Amdahl's law to Learn Computer Performance
Poolla, Chaitanya, Saxena, Rahul
The problem of learning parallel computer performance is investigated in the context of multicore processors. Given a fixed workload, the effect of varying system configuration on performance is sought. Conventionally, the performance speedup due to a single resource enhancement is formulated using Amdahl's law. However, in case of multiple configurable resources the conventional formulation results in several disconnected speedup equations that cannot be combined together to determine the overall speedup. To solve this problem, we propose to (1) extend Amdahl's law to accommodate multiple configurable resources into the overall speedup equation, and (2) transform the speedup equation into a multivariable regression problem suitable for machine learning. Using experimental data from fifty-eight tests spanning two benchmarks (SPECCPU 2017 and PCMark 10) and four hardware platforms (Intel Xeon 8180M, AMD EPYC 7702P, Intel CoffeeLake 8700K, and AMD Ryzen 3900X), analytical models are developed and cross-validated. Findings indicate that in most cases, the models result in an average cross-validated accuracy higher than 95%, thereby validating the proposed extension of Amdahl's law. The proposed methodology enables rapid generation of multivariable analytical models to support future industrial development, optimization, and simulation needs.
The World's Largest Computer Chip
Deep learning, the artificial-intelligence technology that powers voice assistants, autonomous cars, and Go champions, relies on complicated "neural network" software arranged in layers. A deep-learning system can live on a single computer, but the biggest ones are spread over thousands of machines wired together into "clusters," which sometimes live at large data centers, like those operated by Google. In a big cluster, as many as forty-eight pizza-box-size servers slide into a rack as tall as a person; these racks stand in rows, filling buildings the size of warehouses. The neural networks in such systems can tackle daunting problems, but they also face clear challenges. A network spread across a cluster is like a brain that's been scattered around a room and wired together.
Consider Indirect Threats of AI, Too
Alan Bundy's Viewpoint "Smart Machines Are Not a Threat to Humanity" (Feb. Reducing the entire field of AI to four "successful AI systems"--DeepBlue, Tartan Racing, Watson, and AlphaGo--does not give the full picture of the impact of AI on humanity. Recent advances in pattern recognition, due mainly to deep learning, for computer vision and speech recognition have achieved benchmarks comparable to human performance;2 consider AI technologies power surveillance systems, as well as Apple's Siri and Amazon's Echo personal assistants. Looking at such AI algorithms one can imagine AI general intelligence being possible throughout our communication networks, computer interfaces, and tens of millions of Internet of Things devices in the near future. Toward this end, Deepmind Technologies Ltd. (acquired by Google in 2014) created a game-playing program combining deep learning and reinforcement learning that sees the board, as well as moves the pieces on the board.1 Recent advances in generative adversarial learning will reduce reliance on labeled data (and the humans who do the labeling) toward machine-learning software capable of self-improvement.
Scaling up AI
There remains obviously data parallelism, in which case one can actually process separate batches of data on instances running on different GPU's and speed up training, but it is by no means possible to take say VGG16 network, multiply the number of layers by 10x, multiply the number of feature maps by 10x, multiply the size of each layer by 10x, throw it at 1000 GPUs and expect the thing to train successfully. Aside from the problems of efficiently implementing the parallel execution, the thing would need exp(1000) more data and training iterations which is not possible. This is part of the reason why AlexNet or VGG 16 architecture, which are now is a few years old, are still the base architectures for many applications, while the biggest deep learning instances trained today are at most one order of magnitude larger.