Scaling Laws of Decoder-Only Models on the Multilingual Machine Translation Task

Open in new window