Not a nuisance but a useful heuristic: Outlier dimensions favor frequent tokens in language models