Argus Inspection: Do Multimodal Large Language Models Possess the Eye of Panoptes?