Deconstructing Attention: Investigating Design Principles for Effective Language Modeling