Appendix for: Data-Aware Low-Rank Compression for Large NLP Models A Proof of Theorem 1 Theorem 1

Aug-18-2025, 22:00:53 GMT–Neural Information Processing Systems

In addition, a pre-defined search grid is also necessary. With these input parameters, we firstly distribute the total allowed loss into each individual module. First, it's indeed a trade-off between the efficiency and efficacy as the speedup ratio goes higher at the cost of lower Thus, in the real application, users need to decide what's the best We could have chose another cutoff like 1 % accuracy with lower speedup ratio to report, but this won't help too much when comparing different baseline methods. D.1 LSTM result A 2-layer LSTM model is composed of two large matrices layers and one large softmax layer. Thus, despite the matrix is much smaller and well approximated by DRONE, the overall acceleration on GPU is less.

artificial intelligence, inference time, machine learning, (15 more...)

Neural Information Processing Systems

Aug-18-2025, 22:00:53 GMT

Conferences PDF

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.96)

Duplicate Docs Excel Report

Title
f56de5ef149cf0aedcc8f4797031e229-Supplemental.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found