Statistical Mechanics of the Mixture of Experts

Kang, Kukjin, Oh, Jong-Hoon

Neural Information Processing Systems 

Kukjin Kang and Jong-Hoon Oh Department of Physics Pohang University of Science and Technology Hyoja San 31, Pohang, Kyongbuk 790-784, Korea Email: kkj.jhohOgalaxy.postech.ac.kr Abstract We study generalization capability of the mixture of experts learning fromexamples generated by another network with the same architecture. When the number of examples is smaller than a critical value,the network shows a symmetric phase where the role of the experts is not specialized. Upon crossing the critical point, the system undergoes a continuous phase transition to a symmetry breakingphase where the gating network partitions the input space effectively and each expert is assigned to an appropriate subspace. Wealso find that the mixture of experts with multiple level of hierarchy shows multiple phase transitions. 1 Introduction Recently there has been considerable interest among neural network community in techniques that integrate the collective predictions of a set of networks[l, 2, 3, 4]. The mixture of experts [1, 2] is a well known example which implements the philosophy ofdivide-and-conquer elegantly.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found