What do self-supervised speech models know about words?