What Kinds of Tokens Benefit from Distant Text? An Analysis on Long Context Language Modeling