When LLM Meets DRL: Advancing Jailbreaking Efficiency via DRL-guided Search Xuan Chen 1