Improving Retrospective Language Agents via Joint Policy Gradient Optimization