LearningtoDispatchforJobShopSchedulingvia DeepReinforcementLearning

Feb-18-2026, 22:31:52 GMT–Neural Information Processing Systems

In the paper, we adopt the Proximal Policy Optimization (PPO) algorithm [36] to train our agent. Similar to the original PPO in [36], we also useN actors, each solves one JSSP instance drawn from a distributionD. The difference to [36] is that, instead of sampling a batch of data, we use all data collectedbytheN actorstoperformupdate,i.e. InTableS.3,wereportresults of training and testing on 4 groups of instances with sizes up to30 20, where our method outperforms baselines over87.5%(35outof40)oftheseinstances. The "UB" column is the best solution from literature,and"*"meansthesolutionisoptimal.

deepreinforcementlearning, meansthesolutionisoptimal, sptmwkrfdd wkrmopnr, (12 more...)

Neural Information Processing Systems

Feb-18-2026, 22:31:52 GMT

Conferences PDF

Add feedback

Country:
- Asia > China (0.04)
- North America > Canada
  - British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Duplicate Docs Excel Report

Title
Learning to Dispatch for Job Shop Scheduling via Deep Reinforcement Learning Cong Zhang 1, Wen Song

Similar Docs Excel Report more

Title	Similarity	Source
None found