Building on Efficient Foundations: Effective Training of LLMs with Structured Feedforward Layers