What's Different between Visual Question Answering for Machine "Understanding" Versus for Accessibility?