A Appendix

Neural Information Processing Systems 

All CPU experiments are conducted on A WS C5.9xlarge instances with Intel Xeon Platinum 8124M Take TensorCore GPUs as an example. MetaSchedule makes an orthogonal contribution as it is a probabilistic language for composable search space construction rather than speeding up tuning. From frontend frameworks, for example, TensorFlow, PyTorch, or JAX, the tensor program to be optimized is generated from their computational graph. A.7 A vailable Transformations Primitives 17 Transformation Explanation split Split a loop into a sequence of consecutive loops fuse Fuse a sequence of consecutive loops into one reorder Reorder a sequence of loops parallel Parallelize a loop across CPU cores vectorize V ectorize a loop with SIMD unroll Unroll a loop bind Bind a loop to a GPU thread cache-read Create a block that reads a buffer region into a read cache cache-write Create a block that writes a buffer region into a write cache compute-at Move a producer block under the specific loop compute-inline Inline a block into its consumer(s) rfactor Factorize an associative reduction block by the specified loop storage-align Set alignment requirement for specific dimension of a buffer set-scope Set the storage scope of a buffer add-unit-loop Create a new unit loop on top of the specific block re-index

Similar Docs  Excel Report  more

TitleSimilaritySource
None found