SoftmAP: Software-Hardware Co-design for Integer-Only Softmax on Associative Processors
Rakka, Mariam, Li, Jinhao, Dai, Guohao, Eltawil, Ahmed, Fouda, Mohammed E., Kurdahi, Fadi
–arXiv.org Artificial Intelligence
Abstract--Recent research efforts focus on reducing the computational and memory overheads of Large Language Models (LLMs) to make them feasible on resource-constrained devices. Despite advancements in compression techniques, non-linear operators like Softmax and Layernorm remain bottlenecks due to their sensitivity to quantization. We propose SoftmAP, a softwarehardware co-design methodology that implements an integeronly low-precision Softmax using In-Memory Compute (IMC) hardware. Our method achieves up to three orders of magnitude improvement in the energy-delay product compared to A100 and RTX3090 GPUs, making LLMs more deployable without compromising performance. Softmax contributes up to 38% of the run time for longer sequence lengths.
arXiv.org Artificial Intelligence
Nov-26-2024
- Country:
- North America > United States > California
- Orange County > Irvine (0.14)
- San Francisco County > San Francisco (0.14)
- North America > United States > California
- Genre:
- Research Report (0.50)
- Technology: