Cross-Architecture Transfer Learning for Linear-Cost Inference Transformers