Rephrasing natural text data with different languages and quality levels for Large Language Model pre-training

Open in new window