Characterizing Deep Research: A Benchmark and Formal Definition