ReST-MCTS: LLM Self-Training via Process Reward Guided Tree Search Dan Zhang
–Neural Information Processing Systems
Recent methodologies in LLM self-training mostly rely on LLM generating responses and filtering those with correct output answers as training data. This approach often yields a low-quality fine-tuning training set (e.g., incorrect plans or intermediate reasoning).
Neural Information Processing Systems
Nov-19-2025, 17:29:50 GMT
- Country:
- Asia > China (0.04)
- Europe > Estonia (0.04)
- North America > United States
- California (0.04)
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (0.67)
- Research Report
- Industry:
- Information Technology (0.46)
- Leisure & Entertainment > Games
- Go (0.45)
- Technology: