Benchmarking Generative Models on Computational Thinking Tests in Elementary Visual Programming