SS1: Accelerating Inference with Fast and Expressive Sketch Structured Transform

May-27-2025, 01:52:58 GMT–Neural Information Processing Systems

Tensor multiplication with learned weight matrices is the fundamental building block in deep learning models. These matrices can often be sparsified, decomposed, quantized, or subjected to random parameter sharing without losing accuracy, suggesting the possibility of more efficient transforms. Although many variants of weight matrices exist, unstructured ones are incompatible with modern hardware, slowing inference and training. On the other hand, structured variants often limit expressivity or fail to deliver the promised latency benefits. We present Sketch Structured Transform (SS1), an expressive and GPU-friendly operator that accelerates inference.

artificial intelligence, machine learning, sketch structured transform, (8 more...)

Neural Information Processing Systems

May-27-2025, 01:52:58 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)