LLM Cache Bandit Revisited: Addressing Query Heterogeneity for Cost-Effective LLM Inference