StableMask: Refining Causal Masking in Decoder-only Transformer

Open in new window