Why the AI Industry Needs to Rethink Storage - Pure Storage Blog
When deploying a deep learning training cluster, system-level perspective is needed for a well-balanced solution. Let's take an example (shown above) of DGX-1 systems running Microsoft Cognitive Toolkit (formerly known as CNTK) framework using AlexNet. NVIDIA published results showing a DGX-1 can train at a throughput of 13K images per second. If images have an average size of 115KB, 10 DGX-1 has an ingest throughput requirement of 15 GB per second to keep the training job busy. Small-file read performance and IOPS are critical at this point, and can be the limiter in time to solution.
Jul-24-2017, 07:16:28 GMT
- Technology: