AITopics | jssp

Priority dispatching rule (PDR) is widely used for solving real-world Job-shop scheduling problem (JSSP). However, the design of effective PDRs is a tedious task, requiring a myriad of specialized knowledge and often delivering limited performance. In this paper, we propose to automatically learn PDRs via an end-to-end deep reinforcement learning agent. We exploit the disjunctive graph representation of JSSP, and propose a Graph Neural Network based scheme to embed the states encountered during solving. The resulting policy network is size-agnostic, effectively enabling generalization on large-scale instances. Experiments show that the agent can learn high-quality PDRs from scratch with elementary raw features, and demonstrates strong performance against the best existing PDRs. The learned policies also perform well on much larger instances that are unseen in training.

deep reinforcement learning, job shop scheduling, learning, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Unraveling the Rainbow: can value-based methods schedule?

Corrêa, Arthur, Jesus, Alexandre, Nascimento, Paulo, Silva, Cristóvão, Moniz, Samuel

arXiv.org Artificial IntelligenceDec-1-2025

In this work, we conduct an extensive empirical study of several deep reinforcement learning algorithms on two challenging combinatorial optimization problems: the job-shop and flexible job-shop scheduling problems, both fundamental challenges with multiple industrial applications. Broadly, deep reinforcement learning algorithms fall into two categories: policy-gradient and value-based. While value-based algorithms have achieved notable success in domains such as the Arcade Learning Environment, the combinatorial optimization community has predominantly favored policy-gradient algorithms, often overlooking the potential of value-based alternatives. From our results, value-based algorithms demonstrated a lower variance and a more stable convergence profile compared to policy-gradient ones. Moreover, they achieved superior cross-size and cross-distribution generalization, that is, effectively solving instances that are substantially larger or structurally distinct from those seen during training. Finally, our analysis also suggests that the relative performance of each category of algorithms may be dependent on structural properties of the problem, such as problem flexibility and instance size. Overall, our findings challenge the prevailing assumption that policy-gradient algorithms are inherently superior for combinatorial optimization. We show instead that value-based algorithms can match or even surpass the performance of policy-gradient algorithms, suggesting that they deserve greater attention from the combinatorial optimization community. Our code is openly available at: https://github.com/AJ-Correa/Unraveling-the-Rainbow

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2505.03323

Country: North America > United States > Texas (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Winner Takes It All: Training Performant RL Populations for Combinatorial Optimization

Neural Information Processing SystemsOct-9-2025, 02:08:12 GMT

To this end, we introduce Poppy, a simple training procedure for populations.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Combinatorial Optimization with Policy Adaptation using Latent Space Search

Neural Information Processing SystemsOct-8-2025, 05:10:43 GMT

Combinatorial Optimization (CO) has a wide range of real-world applications, from transportation (Contardo et al., 2012) and logistics (Laterre et al., 2018), to energy (Froger et al., 2016). Concretely, leading RL methods typically train a policy to incrementally construct a solution one element at a time.

artificial intelligence, comp ass, machine learning, (12 more...)

Neural Information Processing Systems

Country: North America > United States (0.04)

Genre:

Research Report (0.67)
Overview (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

11958dfee29b6709f48a9ba0387a2431-Paper.pdf

Neural Information Processing SystemsOct-2-2025, 02:53:05 GMT

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

Asia > Singapore (0.04)
North America > United States (0.04)
North America > Canada (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
Information Technology > Artificial Intelligence > Robots (0.68)

Add feedback

A Production Scheduling Framework for Reinforcement Learning Under Real-World Constraints

Hoss, Jonathan, Schelling, Felix, Klarmann, Noah

arXiv.org Artificial IntelligenceJun-18-2025

The classical Job Shop Scheduling Problem (JSSP) focuses on optimizing makespan under deterministic constraints. Real-world production environments introduce additional complexities that cause traditional scheduling approaches to be less effective. Reinforcement learning (RL) holds potential in addressing these challenges, as it allows agents to learn adaptive scheduling strategies. However, there is a lack of a comprehensive, general-purpose frameworks for effectively training and evaluating RL agents under real-world constraints. To address this gap, we propose a modular framework that extends classical JSSP formulations by incorporating key real-world constraints inherent to the shopfloor, including transport logistics, buffer management, machine breakdowns, setup times, and stochastic processing conditions, while also supporting multi-objective optimization. The framework is a customizable solution that offers flexibility in defining problem instances and configuring simulation parameters, enabling adaptation to diverse production scenarios. A standardized interface ensures compatibility with various RL approaches, providing a robust environment for training RL agents and facilitating the standardized comparison of different scheduling methods under dynamic and uncertain conditions. We release JobShopLab as an open-source tool for both research and industrial applications, accessible at: https://github.com/proto-lab-ro/jobshoplab

constraint, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2506.13566

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Germany (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Solving the Job Shop Scheduling Problem with Graph Neural Networks: A Customizable Reinforcement Learning Environment

Fernández, Pablo Ariño

arXiv.org Artificial IntelligenceJun-18-2025

The job shop scheduling problem is an NP-hard combinatorial optimization problem relevant to manufacturing and timetabling. Traditional approaches use priority dispatching rules based on simple heuristics. Recent work has attempted to replace these with deep learning models, particularly graph neural networks (GNNs), that learn to assign priorities from data. However, training such models requires customizing numerous factors: graph representation, node features, action space, and reward functions. The lack of modular libraries for experimentation makes this research time-consuming. This work introduces JobShopLib, a modular library that allows customizing these factors and creating new components with its reinforcement learning environment. We trained several dispatchers through imitation learning to demonstrate the environment's utility. One model outperformed various graph-based dispatchers using only individual operation features, highlighting the importance of feature customization. Our GNN model achieved near state-of-the-art results on large-scale problems. These results suggest significant room for improvement in developing such models. JobShopLib provides the necessary tools for future experimentation.

artificial intelligence, machine learning, reinforcement learning, (21 more...)

arXiv.org Artificial Intelligence

2506.13781

Country: North America > United States (0.92)

Genre: Research Report > New Finding (1.00)

Industry: Education (1.00)

Technology: