Co-training and Co-distillation for Quality Improvement and Compression of Language Models