Balancing Exploration and Exploitation in LLM using Soft RLLF for Enhanced Negation Understanding