Lossless Compression of English Short Messages

#artificialintelligence 

This lossless compressor achieves a much higher compression rate on English texts than general purpose compressors. Its typical compression ratio is 15% (number of output bits divided by the number of input bits). The compression is achieved by using the probability of the next word computed by the GPT-2 language model released by OpenAI. It is a neural network of 345 million parameters based on the Transformer architecture (the largest GPT-2 model of 1.5 billion parameters brings marginal improvement when compressing short messages). An arithmetic coder generates the bit stream.