A Statistical Theory of Gated Attention through the Lens of Hierarchical Mixture of Experts