The approach is detailed in a paper published at MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture, one of the top-tier conferences in computer architecture, where it was selected the conference's best publication. "This is an intensively studied problem that has traditionally relied on extra circuitry to address," said Zhiyao Xie, first author of the paper and a PhD candidate in the laboratory of Yiran Chen, professor of electrical and computer engineering at Duke. "But our approach runs directly on the microprocessor in the background, which opens many new opportunities. I think that's why people are excited about it." In modern computer processors, cycles of computations are made on the order of 3 trillion times per second. Keeping track of the power consumed by such intensely fast transitions is important to maintain the entire chip's performance and efficiency.
Processor power prediction Researchers from Duke University, Arm Research, and Texas A&M University developed an AI method for predicting the power consumption of a processor, returning results more than a trillion times per second while consuming very little power itself. "This is an intensively studied problem that has traditionally relied on extra circuitry to address," said Zhiyao Xie, a PhD candidate at Duke. "But our approach runs directly on the microprocessor in the background, which opens many new opportunities. I think that's why people are excited about it." The approach, called APOLLO, uses an AI algorithm to identify and select just 100 of a processor's millions of signals that correlate most closely with its power consumption. It then builds a power consumption model off of those 100 signals and monitors them to predict the entire chip's performance in real-time.
We began our Turing Lecture June 4, 201811 with a review of computer architecture since the 1960s. In addition to that review, here, we highlight current challenges and identify future opportunities, projecting another golden age for the field of computer architecture in the next decade, much like the 1980s when we did the research that led to our award, delivering gains in cost, energy, and security, as well as performance. "Those who cannot remember the past are condemned to repeat it."--George Software talks to hardware through a vocabulary called an instruction set architecture (ISA). By the early 1960s, IBM had four incompatible lines of computers, each with its own ISA, software stack, I/O system, and market niche--targeting small business, large business, scientific, and real time, respectively. IBM engineers, including ACM A.M. Turing Award laureate Fred Brooks, Jr., thought they could create a single ISA that would efficiently unify all four of these ISA bases. They needed a technical solution for how computers as inexpensive as those with 8-bit data paths and as fast as those with 64-bit data paths could share a single ISA. The data paths are the "brawn" of the processor in that they perform the arithmetic but are relatively easy to "widen" or "narrow." The greatest challenge for computer designers then and now is the "brains" of the processor--the control hardware. Inspired by software programming, computing pioneer and Turing laureate Maurice Wilkes proposed how to simplify control. Control was specified as a two-dimensional array he called a "control store." Each column of the array corresponded to one control line, each row was a microinstruction, and writing microinstructions was called microprogramming.39 A control store contains an ISA interpreter written using microinstructions, so execution of a conventional instruction takes several microinstructions. The control store was implemented through memory, which was much less costly than logic gates. The table here lists four models of the new System/360 ISA IBM announced April 7, 1964. The data paths vary by a factor of 8, memory capacity by a factor of 16, clock rate by nearly 4, performance by 50, and cost by nearly 6.
Artificial Intelligence (AI) applications take into consideration the compute, storage, memory, pipeline, communication interface, software, and control. Further, AI application processing can be distributed across multi-core within processors, multiple processor boards on a PCIe backbone, computers distributed across an ethernet network, high-performance computer, or system across a data center. In addition, AI processors also have a massive memory size requirement, access time limitation, distribution across analog and digital, and hardware-software partition. Architecture exploration of AI applications is complex and involves multiple studies. To start with, we can target a single problem such as memory access or can look at the full processor or system.
NVidia's Titan X graphics card, featuring the company's Pascal-powered graphics processing unit driven by 3,584 CUDA cores running at 1.5GHz. As researchers continue to push the boundaries of neural networks and deep learning--particularly in speech recognition and natural language processing, image and pattern recognition, text and data analytics, and other complex areas--they are constantly on the lookout for new and better ways to extend and expand computing capabilities. For decades, the gold standard has been high-performance computing (HPC) clusters, which toss huge amounts of processing power at problems--albeit at a prohibitively high cost. This approach has helped fuel advances across a wide swath of fields, including weather forecasting, financial services, and energy exploration. However, in 2012, a new method emerged.