Improving Parallel Program Performance Through DSL-Driven Code Generation with LLM Optimizers