Benchmarking Large Language Models with Integer Sequence Generation Tasks