Raising the Bar: Investigating the Values of Large Language Models via Generative Evolving Testing