Learning in Compact Spaces with Approximately Normalized Transformer