VisFactor: Benchmarking Fundamental Visual Cognition in Multimodal Large Language Models