Describe Anything Model for Visual Question Answering on Text-rich Images