LearningtoDispatchforJobShopSchedulingvia DeepReinforcementLearning
–Neural Information Processing Systems
In the paper, we adopt the Proximal Policy Optimization (PPO) algorithm [36] to train our agent. Similar to the original PPO in [36], we also useN actors, each solves one JSSP instance drawn from a distributionD. The difference to [36] is that, instead of sampling a batch of data, we use all data collectedbytheN actorstoperformupdate,i.e. InTableS.3,wereportresults of training and testing on 4 groups of instances with sizes up to30 20, where our method outperforms baselines over87.5%(35outof40)oftheseinstances. The "UB" column is the best solution from literature,and"*"meansthesolutionisoptimal.
Neural Information Processing Systems
Feb-18-2026, 22:31:52 GMT
- Country:
- Asia > China (0.04)
- North America > Canada