Building pre-train LLM Dataset for the INDIC Languages: a case study on Hindi

Open in new window