Goto

Collaborating Authors

 ithemal


COMET: X86 Cost Model Explanation Framework

Chaudhary, Isha, Renda, Alex, Mendis, Charith, Singh, Gagandeep

arXiv.org Artificial Intelligence

ML-based program cost models have been shown to yield fairly accurate program cost predictions. They can replace heavily-engineered analytical program cost models in mainstream compilers, but their black-box nature discourages their adoption. In this work, we propose the first framework, COMET, for generating faithful, generalizable, and intuitive explanations for x86 cost models. COMET brings interpretability specifically to ML-based cost models, such as Ithemal. We generate and compare COMET's explanations for Ithemal against COMET's explanations for a hand-crafted, accurate analytical model, uiCA. Our empirical findings show an inverse correlation between the error in the cost prediction of a cost model and the prominence of semantically-richer features in COMET's explanations for the cost model for a given x86 basic block.


GRANITE: A Graph Neural Network Model for Basic Block Throughput Estimation

Sykora, Ondrej, Phothilimthana, Phitchaya Mangpo, Mendis, Charith, Yazdanbakhsh, Amir

arXiv.org Artificial Intelligence

Analytical hardware performance models yield swift estimation of desired hardware performance metrics. However, developing these analytical models for modern processors with sophisticated microarchitectures is an extremely laborious task and requires a firm understanding of target microarchitecture's internal structure. In this paper, we introduce GRANITE, a new machine learning model that estimates the throughput of basic blocks across different microarchitectures. GRANITE uses a graph representation of basic blocks that captures both structural and data dependencies between instructions. This representation is processed using a graph neural network that takes advantage of the relational information captured in the graph and learns a rich neural representation of the basic block that allows more precise throughput estimation. Our results establish a new state-of-the-art for basic block performance estimation with an average test error of 6.9% across a wide range of basic blocks and microarchitectures for the x86-64 target. Compared to recent work, this reduced the error by 1.7% while improving training and inference throughput by approximately 3.0x. In addition, we propose the use of multi-task learning with independent multi-layer feed forward decoder networks. Our results show that this technique further improves precision of all learned models while significantly reducing per-microarchitecture training costs. We perform an extensive set of ablation studies and comparisons with prior work, concluding a set of methods to achieve high accuracy for basic block performance estimation.


Tool predicts how fast code will run on a chip

#artificialintelligence

MIT researchers have invented a machine-learning tool that predicts how fast computer chips will execute code from various applications. To get code to run as fast as possible, developers and compilers -- programs that translate programming language into machine-readable code -- typically use performance models that run the code through a simulation of given chip architectures. Compilers use that information to automatically optimize code, and developers use it to tackle performance bottlenecks on the microprocessors that will run it. But performance models for machine code are handwritten by a relatively small group of experts and are not properly validated. In series of conference papers, the researchers describe a novel machine-learning pipeline that automates this process, making it easier, faster, and more accurate.


Finally, a good use for AI: Machine-learning tool guesstimates how well your code will run on a CPU core

#artificialintelligence

MIT boffins have devised a software-based tool for predicting how processors will perform when executing code for specific applications. In three papers released over the past seven months, ten computer scientists describe Ithemal (Instruction THroughput Estimator using MAchine Learning), a tool for predicting the number processor clock cycles necessary to execute an instruction sequence when looped in steady state, and include a supporting benchmark and algorithm. Throughput stats matter to compiler designers and performance engineers, but it isn't practical to make such measurements on-demand, according to MIT computer scientists Saman Amarasinghe, Eric Atkinson, Ajay Brahmakshatriya, Michael Carbin, Yishen Chen, Charith Mendis, Yewen Pu, Alex Renda, Ondˇrej Sykora, and Cambridge Yang. So most systems rely on analytical models for their predictions. LLVM offers a command-line tool called llvm-mca that can presents a model for throughput estimation, and Intel offers a closed-source machine code analyzer called IACA (Intel Architecture Code Analyzer), which takes advantage of the company's internal knowledge about its processors.


MIT Develops Machine-Learning Tool to Make Code Run Faster

#artificialintelligence

MIT researchers have built a new benchmark tool that can accurately predict how long it takes given code to execute on a computer chip, which can help programmers tweak the code for better performance. MIT researchers have invented a machine-learning tool that predicts how fast computer chips will execute code from various applications. To get code to run as fast as possible, developers and compilers -- programs that translate programming language into machine-readable code -- typically use performance models that run the code through a simulation of given chip architectures. Compilers use that information to automatically optimize code, and developers use it to tackle performance bottlenecks on the microprocessors that will run it. But performance models for machine code are handwritten by a relatively small group of experts and are not properly validated.


Tool predicts how fast code will run on a chip

#artificialintelligence

MIT researchers have invented a machine-learning tool that predicts how fast computer chips will execute code from various applications. To get code to run as fast as possible, developers and compilers -- programs that translate programming language into machine-readable code -- typically use performance models that run the code through a simulation of given chip architectures. Compilers use that information to automatically optimize code, and developers use it to tackle performance bottlenecks on the microprocessors that will run it. But performance models for machine code are handwritten by a relatively small group of experts and are not properly validated. In series of conference papers, the researchers describe a novel machine-learning pipeline that automates this process, making it easier, faster, and more accurate.


Ithemal: Accurate, Portable and Fast Basic Block Throughput Estimation using Deep Neural Networks

Mendis, Charith, Amarasinghe, Saman, Carbin, Michael

arXiv.org Machine Learning

Statically estimating the number of processor clock cycles it takes to execute a basic block of assembly instructions in steady state (throughput) is important for compiler backend optimizations such as register allocation, instruction selection and instruction scheduling. This is complicated specially in modern x86-64 Complex Instruction Set Computer (CISC) machines with sophisticated processor microarchitectures. Traditionally, compiler writers invest time experimenting and referring to processor manuals to analytically model modern processors with incomplete specifications. This is tedious, error prone and should be done for each processor generation. We present Ithemal, the first automatically learnt estimator to statically predict throughput of a set of basic block instructions using machine learning. Ithemal uses a novel Directed Acyclic Graph-Recurrent Neural Network (DAG-RNN) based data-driven approach for throughput estimation. We show that Ithemal is accurate than state-of-the-art hand written tools used in compiler backends and static machine code analyzers. In particular, our model has a worst case average error of 10.53% on actual throughput values when compared to best case average errors of 19.57% for the LLVM scheduler (llvm-mca) and 22.51% for IACA, Intel's machine code analyzer when compared on three different microarchitectures, while predicting throughput values at a faster rate than aforementioned tools. We also show that Ithemal is portable, learning throughput estimation for Intel Nehalem, Haswell and Skylake microarchitectures without requiring changes to its structure.