Goto

Collaborating Authors

 galvatron


From Smashing Pumpkins to Ferris Bueller: new Australian indie video game Mixtape is a blast of nostalgia

The Guardian

Across Mixtape's four-hour runtime, you'skateboard, mash tongues together during a kiss, TP a house, ride a dinosaur and learn to fly' Across Mixtape's four-hour runtime, you'skateboard, mash tongues together during a kiss, TP a house, ride a dinosaur and learn to fly' W hen Johnny Galvatron was 14, his cousin gave him a copy of the Smashing Pumpkins' seminal 1995 album, Mellon Collie and the Infinite Sadness. For Galvatron, a rambunctious teenager in Geelong who defined himself by his musical taste, it was love at first spin. "I don't think there's a track like Tonight, Tonight from any other band," he reminisces. A song from the album plays at a critical moment in Mixtape, the second game from Galvatron's Melbourne-based studio, Beethoven and Dinosaur. Mixtape is set over a single day; tomorrow, Stacy will be leaving her best friends, Slater and Cassandra, and flying to New York as part of a reckless plan to shove a mixtape into the hands of a superstar music supervisor who will, she believes, be so convinced of Stacy's genius that she'll offer her a job.


Galvatron: An Automatic Distributed System for Efficient Foundation Model Training

Liu, Xinyi, Wang, Yujie, Zhu, Shenhan, Fu, Fangcheng, Liu, Qingshuo, Lin, Guangming, Cui, Bin

arXiv.org Artificial Intelligence

Galvatron is a distributed system for efficiently training large-scale Foundation Models. It overcomes the complexities of selecting optimal parallelism strategies by automatically identifying the most efficient hybrid strategy, incorporating data, tensor, pipeline, sharded data, and sequence parallelism, along with recomputation. The system's architecture includes a profiler for hardware and model analysis, a search engine for strategy optimization using decision trees and dynamic programming, and a runtime for executing these strategies efficiently. Benchmarking on various clusters demonstrates Galvatron's superior throughput compared to existing frameworks. This open-source system offers user-friendly interfaces and comprehensive documentation, making complex distributed training accessible and efficient.


Improving Automatic Parallel Training via Balanced Memory Workload Optimization

Wang, Yujie, Jiang, Youhe, Miao, Xupeng, Fu, Fangcheng, Nie, Xiaonan, Cui, Bin

arXiv.org Artificial Intelligence

Transformer models have emerged as the leading approach for achieving state-of-the-art performance across various application domains, serving as the foundation for advanced large-scale deep learning (DL) models. However, efficiently training these models across multiple GPUs remains a complex challenge due to the abundance of parallelism options. Existing DL systems either require manual efforts to design distributed training plans or limit parallelism combinations to a constrained search space. In this paper, we present Galvatron-BMW, a novel system framework that integrates multiple prevalent parallelism dimensions and automatically identifies the most efficient hybrid parallelism strategy. To effectively navigate this vast search space, we employ a decision tree approach for decomposition and pruning based on intuitive insights. We further utilize a dynamic programming search algorithm to derive the optimal plan. Moreover, to improve resource utilization and enhance system efficiency, we propose a bi-objective optimization workflow that focuses on workload balance. Our evaluations on different Transformer models demonstrate the capabilities of Galvatron-BMW in automating distributed training under varying GPU memory constraints. Across all tested scenarios, Galvatron-BMW consistently achieves superior system throughput, surpassing previous approaches that rely on limited parallelism strategies.


Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism

Miao, Xupeng, Wang, Yujie, Jiang, Youhe, Shi, Chunan, Nie, Xiaonan, Zhang, Hailin, Cui, Bin

arXiv.org Artificial Intelligence

Transformer models have achieved state-of-the-art performance on various domains of applications and gradually becomes the foundations of the advanced large deep learning (DL) models. However, how to train these models over multiple GPUs efficiently is still challenging due to a large number of parallelism choices. Existing DL systems either rely on manual efforts to make distributed training plans or apply parallelism combinations within a very limited search space. In this approach, we propose Galvatron, a new system framework that incorporates multiple popular parallelism dimensions and automatically finds the most efficient hybrid parallelism strategy. To better explore such a rarely huge search space, we 1) involve a decision tree to make decomposition and pruning based on some reasonable intuitions, and then 2) design a dynamic programming search algorithm to generate the optimal plan. Evaluations on four representative Transformer workloads show that Galvatron could perform automatically distributed training with different GPU memory budgets. Among all evluated scenarios, Galvatron always achieves superior system throughput compared to previous work with limited parallelism.


The Artful Escape review – Bowie meets Hitchhiker's in gratifying teenage space opera

The Guardian

Seventeen-year-old guitar prodigy Francis Vendetti lives with his mother in a small Colorado town that is still in thrall to its most famous export: Francis's late uncle, a platinum-selling folk singer. Francis feels inevitable pressure to continue the family trade, and, in preparation for his highly anticipated first public performance in town, writes a suite of Dylan-esque tracks about toil and loss. Except the act is an affectation: Francis is, at heart and by temperament, a prog-rock wailer who dreams of playing high-gain, euphoric guitar solos over the swell of a supportive orchestra. When he's visited by a sympathetic alien being who observes: "You wear folk like a cheap suit", Francis swaps his skinny Levi's for an LED-encrusted catsuit and sets off across the Milky Way to shred for an audience of intergalactic concertgoers. This is true space opera territory – Ziggy-era Bowie meets The Hitchhiker's Guide to the Galaxy – and far from typical video game subject matter.