TRIGO: Benchmarking Formal Mathematical Proof Reduction for Generative Language Models

Open in new window