Chain-of-Model Learning for Language Model

Neural Information Processing Systems 

In this paper, we propose a novel learning paradigm, termed (CoM), which incorporates the causal relationship into the hidden states of each layer as a chain style.