clock rate
A Massively Parallel Digital Learning Processor
We present a new, massively parallel architecture for accelerating machine learning algorithms, based on arrays of variable-resolution arithmetic vector processing elements (VPE). Groups of VPEs operate in SIMD (single instruction multiple data) mode, and each group is connected to an independent memory bank. In this way memory bandwidth scales with the number of VPE, and the main data flows are local, keeping power dissipation low. With 256 VPEs, implemented on two FPGA (field programmable gate array) chips, we obtain a sustained speed of 19 GMACS (billion multiply-accumulate per sec.) for SVM training, and 86 GMACS for SVM classification. This performance is more than an order of magnitude higher than that of any FPGA implementation reported so far.
A Massively Parallel Digital Learning Processor
Graf, Hans P., Cadambi, Srihari, Jakkula, Venkata, Sankaradass, Murugan, Cosatto, Eric, Chakradhar, Srimat, Dourdanovic, Igor
We present a new, massively parallel architecture for accelerating machine learning algorithms, based on arrays of variable-resolution arithmetic vector processing elements (VPE). Groups of VPEs operate in SIMD (single instruction multiple data) mode, and each group is connected to an independent memory bank. In this way memory bandwidth scales with the number of VPE, and the main data flows are local, keeping power dissipation low. With 256 VPEs, implemented on two FPGA (field programmable gate array) chips, we obtain a sustained speed of 19 GMACS (billion multiply-accumulate per sec.) for SVM training, and 86 GMACS for SVM classification. This performance is more than an order of magnitude higher than that of any FPGA implementation reported so far.
General Matrix-Matrix Multiplication Using SIMD features of the PIII
Aberdeen, Douglas, Baxter, Jonathan
Generalised matrix-matrix multiplication forms the kernel of many mathematical algorithms. A faster matrix-matrix multiply immediately benefits these algorithms. In this paper we implement efficient matrix multiplication for large matrices using the floating point Intel Pentium SIMD (Single Instruction Multiple Data) architecture. A description of the issues and our solution is presented, paying attention to all levels of the memory hierarchy. Our results demonstrate an average performance of 2.09 times faster than the leading public domain matrix-matrix multiply routines.
A Full Hardware Guide to Deep Learning -- Tim Dettmers
Deep Learning is very computationally intensive, so you will need a fast CPU with many cores, right? Or is it maybe wasteful to buy a fast CPU? One of the worst things you can do when building a deep learning system is to waste money on hardware that is unnecessary. Here I will guide you step by step through the hardware you will need for a cheap high-performance system. Over the years, I build a total of 7 different deep learning workstations and despite careful research and reasoning, I made my fair share of mistake in selecting hardware parts. In this guide, I want to share my experience that I gained over the years so that you do not make the same mistakes that I did before. The blog post is ordered by mistake severity. This means the mistakes where people usually waste the most money come first.
Artificial Intelligence in education--imagining and building tomorrow's cyber learning platform today
"Advanced cyberlearning environments that involve Virtual Reality and Artificial Intelligence innovations are becoming powerful tools that can facilitate the explorations and conversations needed to solve society's "wicked challenges," said Winslow Burleson, PhD, MSE, an engineer by training and currently associate professor, New York University Rory Meyers College of Nursing. The researchers posit that the use of technology, specifically a bundled and ever-evolving fluid set of integrated cyber tools, will connect disparate groups and individuals, converging them in both a real and an imagined cyber-social-physical environment, called the Holodeck, that Burleson's NYU-X Lab is currently advancing in prototype form, in close collaboration with colleagues at NYU Courant, Tandon, Steinhardt, and Tisch, "The "Holodeck" will support a broad range of transdisciplinary collaborations, integrated education, research, and innovation by providing a networked software/hardware infrastructure that can synthesize visual, audio, physical, social, and societal components," said Burleson. NYU-X Lab's Holodeck prototype harnesses the collective power of shared computation, integrated distributed data, immersive visualization, and social interaction to make possible large-scale synthesis of learning, research, and innovation, that will dramatically accelerate the Rittel and Webber iterative mode of problem solving. The goal is to create a networked infrastructure and communication environment where "wicked challenges" can be iteratively explored and re-solved, utilizing visual, acoustic, and physical sensory feedback, human dynamics with and social collaboration.
Artificial Intelligence in education--imagining and building tomorrow's cyber learning platform today
In the late 1960s, urban planners Horst Rittel and Melvin Webber began formulating the concept of "wicked problems" or "wicked challenges" --problems so vexing in the realm of social and organizational planning that they could not be successfully ameliorated with traditional linear, analytical, systems-engineering types of approaches. These "wicked challenges" are poorly defined, abstruse, and connected to strong moral, political and professional issues. Some examples might include: "How should we deal with crime and violence in our schools? "How should we wage the'War on Terror'? or "What is good national immigration policy?" "Wicked problems," by their very nature, are strongly stakeholder dependent; there is often little consensus even about what the problem is, let alone how to deal with it.