Data Management at NERSC in the Era of Petascale Deep Learning
Now that computer scientists at Lawrence Berkeley National Laboratory's National Energy Research Scientific Computing Center (NERSC) have demonstrated 15 petaflops deep-learning training performance on the Cray Cori supercomputer, the NERSC staff is working to address the data management issues that arise when running production deep-learning codes at such scale. The existing deep learning tools were not designed to efficiently ingest or manage the terabyte- to petabyte-sized deep-learning training sets that scientists can now use on this leadership class supercomputer. "Enabling the NERSC user community to perform deep learning at scale on Cori," Quincey Koziol (Staff, Berkeley Lab) observes, "means scientists can use deep learning as part of their leading-edge scientific efforts." Thus NERSC staff are working to break new ground in adapting existing deep-learning frameworks to run efficiently at scale on thousands of nodes while giving researchers the ability to create and manage training sets containing tens to hundreds of terabytes of data in a portable fashion. For these datasets, it is imperative that they are formatted so Cori can ingest them efficiently at runtime.
May-11-2018, 14:22:06 GMT
- Technology: