VFM-VLM: Vision Foundation Model and Vision Language Model based Visual Comparison for 3D Pose Estimation