Efficient Long-context Language Model Training by Core Attention Disaggregation

Open in new window