Towards Robust Agentic CUDA Kernel Benchmarking, Verification, and Optimization