Interpreting the linear structure of vision-language model embedding spaces