Interpreting the linear structure of vision-language model embedding spaces

Open in new window