MURKA: Multi-Reward Reinforcement Learning with Knowledge Alignment for Optimization Tasks

Jun-16-2026, 03:06:30 GMT–Neural Information Processing Systems

Optimization plays a central role in Operations Research (OR) and numerous industrial applications, yet automating the end-to-end process of translating natural language descriptions into executable optimization programs remains a formidable challenge. While recent efforts have applied Large Language Models (LLMs) to this task, existing approaches are hindered by high inference costs, limited robustness across domains, and weak verification mechanisms. In this work, we propose MURKA, a reinforcement learning and knowledge distillationbased framework that enhances LLM-driven optimization modeling via collaborative agent alignment. MURKA orchestrates three specialized agents--Extractor, Solver, and Checker--to achieve accurate problem understanding, robust formulation, and verifiable execution. The Extractor is trained using group relative policy optimization with a composite reward function that incorporates semantic correctness and execution fidelity.

benchmark, large language model, machine learning, (19 more...)

Neural Information Processing Systems

Jun-16-2026, 03:06:30 GMT

Conferences PDF

Add feedback

Genre:
- Research Report
  - New Finding (1.00)
  - Experimental Study (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Representation & Reasoning
    - Optimization (1.00)
    - Agents > Agent Societies (0.34)
  - Machine Learning > Neural Networks
    - Deep Learning (0.31)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found