Toward a realistic model of speech processing in the brain with self-supervised learning