On the token distance modeling ability of higher RoPE attention dimension

Open in new window