Mixture-of-Channels: Exploiting Sparse FFNs for Efficient LLMs Pre-Training and Inference