Lossless Compression of English Short Messages

Jul-11-2020, 17:06:04 GMT–#artificialintelligence

This lossless compressor achieves a much higher compression rate on English texts than general purpose compressors. Its typical compression ratio is 15% (number of output bits divided by the number of input bits). The compression is achieved by using the probability of the next word computed by the GPT-2 language model released by OpenAI. It is a neural network of 345 million parameters based on the Transformer architecture (the largest GPT-2 model of 1.5 billion parameters brings marginal improvement when compressing short messages). An arithmetic coder generates the bit stream.

large language model, machine learning, natural language, (5 more...)

#artificialintelligence

Jul-11-2020, 17:06:04 GMT

News Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning > Generative AI (0.30)