Retentive Network: A Successor to Transformer for Large Language Models