Screen2Words: Automatic Mobile UI Summarization with Multimodal Learning