Towards Statistical Factuality Guarantee for Large Vision-Language Models