SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains?