Gradual Forgetting: Logarithmic Compression for Extending Transformer Context Windows