Differentiable Reward Optimization for LLM based TTS system