From a Lossless (~1.5:1) Compression Algorithm for Llama2 7B Weights to Variable Precision, Variable Range, Compressed Numeric Data Types for CNNs and LLMs

Apr-16-2024–arXiv.org Artificial Intelligence

This paper attempts to address and reconcile two different issues: the existence of multiple numerical data formats (such as int8, bfloat16, fp8, etc., often non optimal for the application and not directly compatible with one another) and the necessity to reduce their bandwidth requirements, especially in the case of power hungry and slow DRAM. In other words, we would like to be able to support multiple numerical data formats and use a minimal number of bits to represent them while, at the same, not being penalised by the outliers and forced to use a worst-case number of bits to represent them all. This is particularly important for LLMs that have a huge number of weights that can come in a variety of formats. This is also true, to a lesser extent, for CNNs. Activations are also likely to benefit from such approach.

additional data, decompressor, probability, (15 more...)

arXiv.org Artificial Intelligence

Apr-16-2024

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.84)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found