Microsoft's AI learns to answer questions about scenes from image-text pairs