DCP: Addressing Input Dynamism In Long-Context Training via Dynamic Context Parallelism

Open in new window