Efficiently Training 7B LLM with 1 Million Sequence Length on 8 GPUs

Open in new window