LearningtoDispatchforJobShopSchedulingvia DeepReinforcementLearning

Neural Information Processing Systems 

In the paper, we adopt the Proximal Policy Optimization (PPO) algorithm [36] to train our agent. Similar to the original PPO in [36], we also useN actors, each solves one JSSP instance drawn from a distributionD. The difference to [36] is that, instead of sampling a batch of data, we use all data collectedbytheN actorstoperformupdate,i.e. InTableS.3,wereportresults of training and testing on 4 groups of instances with sizes up to30 20, where our method outperforms baselines over87.5%(35outof40)oftheseinstances. The "UB" column is the best solution from literature,and"*"meansthesolutionisoptimal.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found