Why Vision Language Models Struggle with Visual Arithmetic? Towards Enhanced Chart and Geometry Understanding