Collaborating Authors


Domain-Specific Hardware Accelerators

Communications of the ACM

From the simple embedded processor in your washing machine to powerful processors in data center servers, most computing today takes place on general-purpose programmable processors or CPUs. CPUs are attractive because they are easy to program and because large code bases exist for them. The programmability of CPUs stems from their execution of sequences of simple instructions, such as ADD or BRANCH; however, the energy required to fetch and interpret an instruction is 10x to 4000x more than that required to perform a simple operation such as ADD. This high overhead was acceptable when processor performance and efficiency were scaling according to Moore's Law.32 One could simply wait and an existing application would run faster and more efficiently. Our economy has become dependent on these increases in computing performance and efficiency to enable new features and new applications. Today, Moore's Law has largely ended,12 and we must look to alternative architectures with lower overhead, such as domain-specific accelerators, to continue scaling of performance and efficiency. There are several ways to realize domain-specific accelerators as discussed in the sidebar on accelerator options. A domain-specific accelerator is a hardware computing engine that is specialized for a particular domain of applications. Accelerators have been designed for graphics,26 deep learning,16 simulation,2 bioinformatics,49 image processing,38 and many other tasks. Accelerators can offer orders of magnitude improvements in performance/cost and performance/W compared to general-purpose computers. For example, our bioinformatics accelerator, Darwin,49 is up to 15,000x faster than a CPU at reference-based, long-read assembly. The performance and efficiency of accelerators is due to a combination of specialized operations, parallelism, efficient memory systems, and reduction of overhead. Domain-specific accelerators7 are becoming more pervasive and more visible, because they are one of the few remaining ways to continue to improve performance and efficiency now that Moore's Law has ended.22 Most applications require modifications to achieve high speed up on domain-specific accelerators. These applications are highly tuned to balance the performance of conventional processors with their memory systems.


Communications of the ACM

Moritz Lipp is a Ph.D. candidate at Graz University of Technology, Flanders, Austria. Michael Schwarz is a postdoctoral researcher at Graz University of Technology, Flanders, Austria. Daniel Gruss is an assistant professor at Graz University of Technology, Flanders, Austria. Thomas Prescher is a chief architect at Cyberus Technology GmbH, Dresden, Germany. Werner Haas is the Chief Technology Officer at Cyberus Technology GmbH, Dresden, Germany.

Oracle BrandVoice: GPU Chips Are Poised To Rewrite (Again) What's Possible In Cloud Computing


At Altair, chief technology officer Sam Mahalingam is heads-down testing the company's newest software for designing cars, buildings, windmills, and other complex systems. The engineering and design software company, whose customers include BMW, Daimler, Airbus, and General Electric, is developing software that combines computer models of wind and fluid flows with machine design in the same process--so an engineer could design a turbine blade while simultaneously seeing its draft's effect on neighboring mills in a wind farm. What Altair needs for a job as hard as this, though, is a particular kind of computing power, provided by graphics processing units (GPUs) made by Silicon Valley's Nvidia and others. "When solving complex design challenges like the interaction between wind structures in windmills, GPUs help expedite computing so faster business decisions can be made," Mahalingam says. An aerodynamics simulation performed with Altair ultraFluidX on the Altair CX-1 concept design, modeled in Altair Inspire Studio.

To Tune Up Your Quantum Computer, Better Call an AI Mechanic


A high-end race car engine needs all its components tuned and working together precisely to deliver top-quality performance. The same can be said about the processor inside a quantum computer, whose delicate bits must be adjusted in just the right way before it can perform a calculation. According to a team that includes scientists at JQI and the National Institute of Standards and Technology (NIST), it's an artificial intelligence, that's who. The team's paper in the journal Physical Review Applied outlines a way to teach an AI to make an interconnected set of adjustments to tiny quantum dots, which are among the many promising devices for creating the quantum bits, or "qubits," that would form the switches in a quantum computer's processor. Precisely tweaking the dots is crucial for transforming them into properly functioning qubits, and until now the job had to be done painstakingly by human operators, requiring hours of work to create even a small handful of qubits for a single calculation.

Google launches TensorFlow Quantum


Quantum computers have been quite the rage recently with different tech companies vying for the top spot when it comes to building the most powerful quantum machine. While IBM and Google were in the headlines last year for achieving quantum supremacy, other companies like the Industrial giant Honeywell have been quietly working on its own quantum tech. The company plans to make available its quantum machine to clients via the internet in the next three months. However, Honeywell's approach is a little different than the traditional quantum computers which use superconducting qubits to operate. Honeywell's quantum computer uses a different technology, called ion traps, which hold ions in place with electromagnetic fields.

D-Wave: Quantum computing and machine learning are 'extremely well matched'


Following D-Wave's announcement of Leap 2, a new version of its quantum cloud service for building and deploying quantum computing applications, VentureBeat had the opportunity to sit down with Murray Thom, D-Wave's VP of software and cloud services. We naturally talked about Leap 2, including the improvements the company hopes it will bring for businesses and developers. But we also discussed the business applications D-Wave has already seen to date. Quantum computing leverages qubits to perform computations that would be much more difficult, or simply not feasible, for a classical computer. Based in Burnaby, Canada, D-Wave was the first company to sell commercial quantum computers, which are built to use quantum annealing.

Memory-efficient Learning for Large-scale Computational Imaging Machine Learning

Critical aspects of computational imaging systems, such as experimental design and image priors, can be optimized through deep networks formed by the unrolled iterations of classical model-based reconstructions (termed physics-based networks). However, for real-world large-scale inverse problems, computing gradients via backpropagation is infeasible due to memory limitations of graphics processing units. In this work, we propose a memory-efficient learning procedure that exploits the reversibility of the network's layers to enable data-driven design for large-scale computational imaging systems. We demonstrate our method on a small-scale compressed sensing example, as well as two large-scale real-world systems: multi-channel magnetic resonance imaging and super-resolution optical microscopy.

Could quantum computing help beat the next coronavirus?

USATODAY - Tech Top Stories

Quantum computing isn't yet far enough along that it could have helped curb the spread of this coronavirus outbreak. But this emerging field of computing will almost certainly help scientists and researchers confront future crises. "Can we compress the rate at which we discover, for example, a treatment or an approach to this?" asks Dario Gil, the director of IBM Research. "The goal is to do everything that we are doing today in terms of discovery of materials, chemistry, things like that, (in) factors of 10 times better, 100 times better," And that, he says, "could be game-changing." Quantum computing is the next big thing in computing, and it promises exponential advances in artificial intelligence and machine learning through the next decade and beyond, leading to potential breakthroughs in healthcare and pharmaceuticals, fertilizers, battery power, and financial services.

FLAME: A Self-Adaptive Auto-labeling System for Heterogeneous Mobile Processors Machine Learning

How to accurately and efficiently label data on a mobile device is critical for the success of training machine learning models on mobile devices. Auto-labeling data on mobile devices is challenging, because data is usually incrementally generated and there is possibility of having unknown labels. Furthermore, the rich hardware heterogeneity on mobile devices creates challenges on efficiently executing auto-labeling workloads. In this paper, we introduce Flame, an auto-labeling system that can label non-stationary data with unknown labels. Flame includes a runtime system that efficiently schedules and executes auto-labeling workloads on heterogeneous mobile processors. Evaluating Flame with eight datasets on a smartphone, we demonstrate that Flame enables auto-labeling with high labeling accuracy and high performance.

Planning for Compilation of a Quantum Algorithm for Graph Coloring Artificial Intelligence

The problem of compiling general quantum algorithms for implementation on near-term quantum processors has been introduced to the AI community. Previous work demonstrated that temporal planning is an attractive approach for part of this compilationtask, specifically, the routing of circuits that implement the Quantum Alternating Operator Ansatz (QAOA) applied to the MaxCut problem on a quantum processor architecture. In this paper, we extend the earlier work to route circuits that implement QAOA for Graph Coloring problems. QAOA for coloring requires execution of more, and more complex, operations on the chip, which makes routing a more challenging problem. We evaluate the approach on state-of-the-art hardware architectures from leading quantum computing companies. Additionally, we apply a planning approach to qubit initialization. Our empirical evaluation shows that temporal planning compares well to reasonable analytic upper bounds, and that solving qubit initialization with a classical planner generally helps temporal planners in finding shorter-makespan compilations for QAOA for Graph Coloring. These advances suggest that temporal planning can be an effective approach for more complex quantum computing algorithms and architectures.