Chard, Ryan
Flight: A FaaS-Based Framework for Complex and Hierarchical Federated Learning
Hudson, Nathaniel, Hayot-Sasson, Valerie, Babuji, Yadu, Baughman, Matt, Pauloski, J. Gregory, Chard, Ryan, Foster, Ian, Chard, Kyle
Federated Learning (FL) is a decentralized machine learning paradigm where models are trained on distributed devices and are aggregated at a central server. Existing FL frameworks assume simple two-tier network topologies where end devices are directly connected to the aggregation server. While this is a practical mental model, it does not exploit the inherent topology of real-world distributed systems like the Internet-of-Things. We present Flight, a novel FL framework that supports complex hierarchical multi-tier topologies, asynchronous aggregation, and decouples the control plane from the data plane. We compare the performance of Flight against Flower, a state-of-the-art FL framework. Our results show that Flight scales beyond Flower, supporting up to 2048 simultaneous devices, and reduces FL makespan across several models. Finally, we show that Flight's hierarchical FL model can reduce communication overheads by more than 60%.
Trillion Parameter AI Serving Infrastructure for Scientific Discovery: A Survey and Vision
Hudson, Nathaniel, Pauloski, J. Gregory, Baughman, Matt, Kamatar, Alok, Sakarvadia, Mansi, Ward, Logan, Chard, Ryan, Bauer, Andrรฉ, Levental, Maksim, Wang, Wenyi, Engler, Will, Skelly, Owen Price, Blaiszik, Ben, Stevens, Rick, Chard, Kyle, Foster, Ian
Deep learning methods are transforming research, enabling new techniques, and ultimately leading to new discoveries. As the demand for more capable AI models continues to grow, we are now entering an era of Trillion Parameter Models (TPM), or models with more than a trillion parameters -- such as Huawei's PanGu-$\Sigma$. We describe a vision for the ecosystem of TPM users and providers that caters to the specific needs of the scientific community. We then outline the significant technical challenges and open problems in system design for serving TPMs to enable scientific research and discovery. Specifically, we describe the requirements of a comprehensive software stack and interfaces to support the diverse and flexible requirements of researchers.
Linking the Dynamic PicoProbe Analytical Electron-Optical Beam Line / Microscope to Supercomputers
Brace, Alexander, Vescovi, Rafael, Chard, Ryan, Saint, Nickolaus D., Ramanathan, Arvind, Zaluzec, Nestor J., Foster, Ian
The Dynamic PicoProbe at Argonne National Laboratory is undergoing upgrades that will enable it to produce up to 100s of GB of data per day. While this data is highly important for both fundamental science and industrial applications, there is currently limited on-site infrastructure to handle these high-volume data streams. We address this problem by providing a software architecture capable of supporting large-scale data transfers to the neighboring supercomputers at the Argonne Leadership Computing Facility. To prepare for future scientific workflows, we implement two instructive use cases for hyperspectral and spatiotemporal datasets, which include: (i) off-site data transfer, (ii) machine learning/artificial intelligence and traditional data analysis approaches, and (iii) automatic metadata extraction and cataloging of experimental results. This infrastructure supports expected workloads and also provides domain scientists the ability to reinterrogate data from past experiments to yield additional scientific value and derive new insights.
APPFLx: Providing Privacy-Preserving Cross-Silo Federated Learning as a Service
Li, Zilinghan, He, Shilan, Chaturvedi, Pranshu, Hoang, Trung-Hieu, Ryu, Minseok, Huerta, E. A., Kindratenko, Volodymyr, Fuhrman, Jordan, Giger, Maryellen, Chard, Ryan, Kim, Kibaek, Madduri, Ravi
Cross-silo privacy-preserving federated learning (PPFL) is a powerful tool to collaboratively train robust and generalized machine learning (ML) models without sharing sensitive (e.g., healthcare of financial) local data. To ease and accelerate the adoption of PPFL, we introduce APPFLx, a ready-to-use platform that provides privacy-preserving cross-silo federated learning as a service. APPFLx employs Globus authentication to allow users to easily and securely invite trustworthy collaborators for PPFL, implements several synchronous and asynchronous FL algorithms, streamlines the FL experiment launch process, and enables tracking and visualizing the life cycle of FL experiments, allowing domain experts and ML practitioners to easily orchestrate and evaluate cross-silo FL under one platform. APPFLx is available online at https://appflx.link
Cloud Services Enable Efficient AI-Guided Simulation Workflows across Heterogeneous Resources
Ward, Logan, Pauloski, J. Gregory, Hayot-Sasson, Valerie, Chard, Ryan, Babuji, Yadu, Sivaraman, Ganesh, Choudhury, Sutanay, Chard, Kyle, Thakur, Rajeev, Foster, Ian
Applications that fuse machine learning and simulation can benefit from the use of multiple computing resources, with, for example, simulation codes running on highly parallel supercomputers and AI training and inference tasks on specialized accelerators. Here, we present our experiences deploying two AI-guided simulation workflows across such heterogeneous systems. A unique aspect of our approach is our use of cloud-hosted management services to manage challenging aspects of cross-resource authentication and authorization, function-as-a-service (FaaS) function invocation, and data transfer. We show that these methods can achieve performance parity with systems that rely on direct connection between resources. We achieve parity by integrating the FaaS system and data transfer capabilities with a system that passes data by reference among managers and workers, and a user-configurable steering algorithm to hide data transfer latencies. We anticipate that this ease of use can enable routine use of heterogeneous resources in computational science.
OpenHLS: High-Level Synthesis for Low-Latency Deep Neural Networks for Experimental Science
Levental, Maksim, Khan, Arham, Chard, Ryan, Yoshii, Kazutomo, Chard, Kyle, Foster, Ian
In many experiment-driven scientific domains, such as high-energy physics, material science, and cosmology, high data rate experiments impose hard constraints on data acquisition systems: collected data must either be indiscriminately stored for post-processing and analysis, thereby necessitating large storage capacity, or accurately filtered in real-time, thereby necessitating low-latency processing. Deep neural networks, effective in other filtering tasks, have not been widely employed in such data acquisition systems, due to design and deployment difficulties. We present an open source, lightweight, compiler framework, without any proprietary dependencies, OpenHLS, based on high-level synthesis techniques, for translating high-level representations of deep neural networks to low-level representations, suitable for deployment to near-sensor devices such as field-programmable gate arrays. We evaluate OpenHLS on various workloads and present a case-study implementation of a deep neural network for Bragg peak detection in the context of high-energy diffraction microscopy. We show OpenHLS is able to produce an implementation of the network with a throughput 4.8 $\mu$s/sample, which is approximately a 4$\times$ improvement over the existing implementation
FAIR principles for AI models with a practical application for accelerated high energy diffraction microscopy
Ravi, Nikil, Chaturvedi, Pranshu, Huerta, E. A., Liu, Zhengchun, Chard, Ryan, Scourtas, Aristana, Schmidt, K. J., Chard, Kyle, Blaiszik, Ben, Foster, Ian
A concise and measurable set of FAIR (Findable, Accessible, Interoperable and Reusable) principles for scientific data is transforming the state-of-practice for data management and stewardship, supporting and enabling discovery and innovation. Learning from this initiative, and acknowledging the impact of artificial intelligence (AI) in the practice of science and engineering, we introduce a set of practical, concise, and measurable FAIR principles for AI models. We showcase how to create and share FAIR data and AI models within a unified computational framework combining the following elements: the Advanced Photon Source at Argonne National Laboratory, the Materials Data Facility, the Data and Learning Hub for Science, and funcX, and the Argonne Leadership Computing Facility (ALCF), in particular the ThetaGPU supercomputer and the SambaNova DataScale system at the ALCF AI Testbed. We describe how this domain-agnostic computational framework may be harnessed to enable autonomous AI-driven discovery.
Globus Automation Services: Research process automation across the space-time continuum
Chard, Ryan, Pruyne, Jim, McKee, Kurt, Bryan, Josh, Raumann, Brigitte, Ananthakrishnan, Rachana, Chard, Kyle, Foster, Ian
Research process automation -- the reliable, efficient, and reproducible execution of linked sets of actions on scientific instruments, computers, data stores, and other resources -- has emerged as an essential element of modern science. We report here on new services within the Globus research data management platform that enable the specification of diverse research processes as reusable sets of actions, \emph{flows}, and the execution of such flows in heterogeneous research environments. To support flows with broad spatial extent (e.g., from scientific instrument to remote data center) and temporal extent (from seconds to weeks), these Globus automation services feature: 1) cloud hosting for reliable execution of even long-lived flows despite sporadic failures; 2) a simple specification and extensible asynchronous action provider API, for defining and executing a wide variety of actions and flows involving heterogeneous resources; 3) an event-driven execution model for automating execution of flows in response to arbitrary events; and 4) a rich security model enabling authorization delegation mechanisms for secure execution of long-running actions across distributed resources. These services permit researchers to outsource and automate the management of a broad range of research tasks to a reliable, scalable, and secure cloud platform. We present use cases for Globus automation services, describe their design and implementation, present microbenchmark studies, and review experiences applying the services in a range of applications.
Colmena: Scalable Machine-Learning-Based Steering of Ensemble Simulations for High Performance Computing
Ward, Logan, Sivaraman, Ganesh, Pauloski, J. Gregory, Babuji, Yadu, Chard, Ryan, Dandu, Naveen, Redfern, Paul C., Assary, Rajeev S., Chard, Kyle, Curtiss, Larry A., Thakur, Rajeev, Foster, Ian
Scientific applications that involve simulation ensembles can be accelerated greatly by using experiment design methods to select the best simulations to perform. Methods that use machine learning (ML) to create proxy models of simulations show particular promise for guiding ensembles but are challenging to deploy because of the need to coordinate dynamic mixes of simulation and learning tasks. We present Colmena, an open-source Python framework that allows users to steer campaigns by providing just the implementations of individual tasks plus the logic used to choose which tasks to execute when. Colmena handles task dispatch, results collation, ML model invocation, and ML model (re)training, using Parsl to execute tasks on HPC systems. We describe the design of Colmena and illustrate its capabilities by applying it to electrolyte design, where it both scales to 65536 CPUs and accelerates the discovery rate for high-performance molecules by a factor of 100 over unguided searches.
Confluence of Artificial Intelligence and High Performance Computing for Accelerated, Scalable and Reproducible Gravitational Wave Detection
Huerta, E. A., Khan, Asad, Huang, Xiaobo, Tian, Minyang, Levental, Maksim, Chard, Ryan, Wei, Wei, Heflin, Maeve, Katz, Daniel S., Kindratenko, Volodymyr, Mu, Dawei, Blaiszik, Ben, Foster, Ian
Over the last five years, the advanced LIGO and advanced Virgo detectors have completed three observing runs, reporting over 50 gravitational wave sources [3, 4]. Significant improvements in the sensitivity of the advanced LIGO and advanced Virgo detectors during the last three observing runs have increased the observable volume they can probe, thereby increasing the number of gravitational wave observations [4]. As these observatories continue to enhance their detection capabilities, and other detectors join the international array of gravitational wave detectors, it is expected that gravitational wave sources will be observed at a rate of several per day [4, 5]. An ever-increasing catalog of gravitational wave sources will enable systematic studies that will refine and advance our understanding of stellar evolution, cosmology, alternative theories and gravity, among others [6-11]. The combination of gravitational and electromagnetic waves, and cosmic neutrinos, will shed revolutionary insights into the nature of supranuclear matter in neutron stars [12-14] and the formation and evolution of black holes and neutron stars, providing new and detailed information about their astrophysical environments [15-18]. While all of these science goals are feasible in principle given the proven detection capabilities of astronomical observatories, it is equally true that established algorithms for the observation of multi-messenger sources, such as template matching and nearest neighbors, are compute-intensive and poorly scalable [19-23]. Furthermore, available computational resources will remain oversubscribed, and planned enhancements will be rapidly outstripped with the advent of next-generation detectors within the next couple of years [24, 25]. Thus, an urgent rethinking is critical if we are to realize the Multi-Messenger Astrophysics program in the big-data era [26-28]. To contend with these challenges, a number of researchers have been exploring the application of deep learning and GPU-accelerated computing.