Can Model Fusing Help Transformers in Long Document Classification? An Empirical Study