ReST-MCTS: LLM Self-Training via Process Reward Guided Tree Search Dan Zhang
–Neural Information Processing Systems
Recent methodologies in LLM self-training mostly rely on LLM generating responses and filtering those with correct output answers as training data. This approach often yields a low-quality fine-tuning training set (e.g., incorrect plans or intermediate reasoning).
Neural Information Processing Systems
Mar-22-2025, 19:15:34 GMT
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (0.67)
- Research Report
- Industry:
- Information Technology (0.45)
- Leisure & Entertainment > Games
- Go (0.45)
- Technology: