RECAP: Reproducing Copyrighted Data from LLMs Training with an Agentic Pipeline