Rethinking Optimization and Architecture for Tiny Language Models