Towards a high-performance AI compiler with upstream MLIR

Golin, Renato, Chelini, Lorenzo, Siemieniuk, Adam, Madhu, Kavitha, Hasabnis, Niranjan, Pabst, Hans, Georganas, Evangelos, Heinecke, Alexander

Apr-15-2024–arXiv.org Artificial Intelligence

This work proposes a compilation flow using open-source compiler passes to build a framework to achieve ninja performance from a generic linear algebra high-level abstraction. We demonstrate this flow with a proof-of-concept MLIR project that uses input IR in Linalg-on-Tensor from TensorFlow and PyTorch, performs cache-level optimizations and lowering to micro-kernels for efficient vectorization, achieving over 90% of the performance of ninja-written equivalent programs. The contributions of this work include: (1) Packing primitives on the tensor dialect and passes for cache-aware distribution of tensors (single and multi-core) and type-aware instructions (VNNI, BFDOT, BFMMLA), including propagation of shapes across the entire function; (2) A linear algebra pipeline, including tile, fuse and bufferization strategies to get model-level IR into hardware friendly tile calls; (3) A mechanism for micro-kernel lowering to an open source library that supports various CPUs.

compiler, dialect, opération, (16 more...)

arXiv.org Artificial Intelligence

Apr-15-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States > New York > New York County > New York City (0.04)

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.36)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found