Ember: A Compiler for Efficient Embedding Operations on Decoupled Access-Execute Architectures

Siracusa, Marco, Hsu, Olivia, Soria-Pardos, Victor, Randall, Joshua, Grasset, Arnaud, Biscondi, Eric, Joseph, Doug, Allen, Randy, Kjolstad, Fredrik, Planas, Miquel Moretó, Armejach, Adrià

Apr-15-2025–arXiv.org Artificial Intelligence

Irregular embedding lookups are a critical bottleneck in recommender models, sparse large language models, and graph learning models. In this paper, we first demonstrate that, by offloading these lookups to specialized access units, Decoupled Access-Execute (DAE) processors achieve 2.6$\times$ higher performance and 6.4$\times$ higher performance/watt than GPUs on end-to-end models. Then, we propose the Ember compiler for automatically generating optimized DAE code from PyTorch and TensorFlow. Conversely from other DAE compilers, Ember features multiple intermediate representations specifically designed for different optimization levels. In this way, Ember can implement all optimizations to match the performance of hand-written code, unlocking the full potential of DAE architectures at scale.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Apr-15-2025

arXiv.org PDF

Add feedback

Country:
- Europe (1.00)
- North America > United States
  - California (0.93)
  - New York > New York County
    - New York City (0.15)

Genre:
- Research Report (0.50)

Industry:
- Information Technology (0.69)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Natural Language (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.89)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found