Tutoring LLM into a Better CUDA Optimizer