Understanding the Mixture-of-Experts with Nadaraya-Watson Kernel