operational research
Multi-Agent Reinforcement Learning for Intraday Operating Rooms Scheduling under Uncertainty
Liu, Kailiang, Chen, Ying, Borndörfer, Ralf, Koch, Thorsten
Intraday surgical scheduling is a multi-objective decision problem under uncertainty-balancing elective throughput, urgent and emergency demand, delays, sequence-dependent setups, and overtime. We formulate the problem as a cooperative Markov game and propose a multi-agent reinforcement learning (MARL) framework in which each operating room (OR) is an agent trained with centralized training and decentralized execution. All agents share a policy trained via Proximal Policy Optimization (PPO), which maps rich system states to actions, while a within-epoch sequential assignment protocol constructs conflict-free joint schedules across ORs. A mixed-integer pre-schedule provides reference starting times for electives; we impose type-specific quadratic delay penalties relative to these references and a terminal overtime penalty, yielding a single reward that captures throughput, timeliness, and staff workload. In simulations reflecting a realistic hospital mix (six ORs, eight surgery types, random urgent and emergency arrivals), the learned policy outperforms six rule-based heuristics across seven metrics and three evaluation subsets, and, relative to an ex post MIP oracle, quantifies optimality gaps. Policy analytics reveal interpretable behavior-prioritizing emergencies, batching similar cases to reduce setups, and deferring lower-value electives. We also derive a suboptimality bound for the sequential decomposition under simplifying assumptions. We discuss limitations-including OR homogeneity and the omission of explicit staffing constraints-and outline extensions. Overall, the approach offers a practical, interpretable, and tunable data-driven complement to optimization for real-time OR scheduling.
A Fair OR-ML Framework for Resource Substitution in Large-Scale Networks
Mohan, Ved, Raqabi, El Mehdi Er, Van Hentenryck, Pascal
Ensuring that the right resource is available at the right location and time remains a major challenge for organizations operating large-scale logistics networks. The challenge comes from uneven demand patterns and the resulting asymmetric flow of resources across the arcs, which create persistent imbalances at the network nodes. Resource substitution among multiple, potentially composite and interchangeable, resource types is a cost-effective way to mitigate these imbalances. This leads to the resource substitution problem, which aims at determining the minimum number of resource substitutions from an initial assignment to minimize the overall network imbalance. In decentralized settings, achieving globally coordinated solutions becomes even more difficult. When substitution entails costs, effective prescriptions must also incorporate fairness and account for the individual preferences of schedulers. This paper presents a generic framework that combines operations research (OR) and machine learning (ML) to enable fair resource substitution in large networks. The OR component models and solves the resource substitution problem under a fairness lens. The ML component leverages historical data to learn schedulers' preferences, guide intelligent exploration of the decision space, and enhance computational efficiency by dynamically selecting the top-$κ$ resources for each arc in the network. The framework produces a portfolio of high-quality solutions from which schedulers can select satisfactory trade-offs. The proposed framework is applied to the network of one of the largest package delivery companies in the world, which serves as the primary motivation for this research. Computational results demonstrate substantial improvements over state-of-the-art methods, including an 80% reduction in model size and a 90% decrease in execution time while preserving optimality.
An approach of deep reinforcement learning for maximizing the net present value of stochastic projects
Xu, Wei, Yang, Fan, Cui, Qinyuan, Chen, Zhi
This paper investigates a project with stochastic activity durations and cash flows under discrete scenarios, where activities must satisfy precedence constraints generating cash inflows and outflows. The objective is to maximize expected net present value (NPV) by accelerating inflows and deferring outflows. We formulate the problem as a discrete-time Markov Decision Process (MDP) and propose a Double Deep Q-Network (DDQN) approach. Comparative experiments demonstrate that DDQN outperforms traditional rigid and dynamic strategies, particularly in large-scale or highly uncertain environments, exhibiting superior computational capability, policy reliability, and adaptability. Ablation studies further reveal that the dual-network architecture mitigates overestimation of action values, while the target network substantially improves training convergence and robustness. These results indicate that DDQN not only achieves higher expected NPV in complex project optimization but also provides a reliable framework for stable and effective policy implementation.
A Multimodal Approach to SME Credit Scoring Integrating Transaction and Ownership Networks
Zandi, Sahab, Korangi, Kamesh, Moreno-Paredes, Juan C., Óskarsdóttir, María, Mues, Christophe, Bravo, Cristián
Small and Medium-sized Enterprises (SMEs) are known to play a vital role in economic growth, employment, and innovation. However, they tend to face significant challenges in accessing credit due to limited financial histories, collateral constraints, and exposure to macroeconomic shocks. These challenges make an accurate credit risk assessment by lenders crucial, particularly since SMEs frequently operate within interconnected firm networks through which default risk can propagate. This paper presents and tests a novel approach for modelling the risk of SME credit, using a unique large data set of SME loans provided by a prominent financial institution. Specifically, our approach employs Graph Neural Networks to predict SME default using multilayer network data derived from common ownership and financial transactions between firms. We show that combining this information with traditional structured data not only improves application scoring performance, but also explicitly models contagion risk between companies. Further analysis shows how the directionality and intensity of these connections influence financial risk contagion, offering a deeper understanding of the underlying processes. Our findings highlight the predictive power of network data, as well as the role of supply chain networks in exposing SMEs to correlated default risk.
An experimental approach: The graph of graphs
Szádoczki, Zsombor, Bozóki, Sándor, Sipos, László, Galambosi, Zsófia
One of the essential issues in decision problems and preference modeling is the number of comparisons and their pattern to ask from the decision maker. We focus on the optimal patterns of pairwise comparisons and the sequence including the most (close to) optimal cases based on the results of a color selection experiment. In the test, six colors (red, green, blue, magenta, turquoise, yellow) were evaluated with pairwise comparisons as well as in a direct manner, on color-calibrated tablets in ISO standardized sensory test booths of a sensory laboratory. All the possible patterns of comparisons resulting in a connected representing graph were evaluated against the complete data based on 301 individual's pairwise comparison matrices (PCMs) using the logarithmic least squares weight calculation technique. It is shown that the empirical results, i.e., the empirical distributions of the elements of PCMs, are quite similar to the former simulated outcomes from the literature. The obtained empirically optimal patterns of comparisons were the best or the second best in the former simulations as well, while the sequence of comparisons that contains the most (close to) optimal patterns is exactly the same. In order to enhance the applicability of the results, besides the presentation of graph of graphs, and the representing graphs of the patterns that describe the proposed sequence of comparisons themselves, the recommendations are also detailed in a table format as well as in a Java application.
EvoCut: Strengthening Integer Programs via Evolution-Guided Language Models
Yazdani, Milad, Mostajabdaveh, Mahdi, Aref, Samin, Zhou, Zirui
Integer programming lies at the heart of crucial combinatorial optimization tasks but remains challenging due to its NP-hard nature. An effective approach for practically solving integer programs is the manual design of acceleration cuts, i.e. inequalities that improve solver performance. However, this creative process demands deep expertise and is yet to be automated. Our proposed framework, EvoCut, automates the generation of acceleration cuts by combining large language models (LLMs) with an evolutionary search. EvoCut (i) initializes a diverse population of candidate cuts via an LLM-based initializer agent; (ii) for each cut empirically evaluates both preservation of the optimal solution and its ability to cut off fractional solutions across a verification set; and (iii) iteratively refines the population through evolutionary crossover and mutation agents. We quantify each cut's utility by its relative reduction in the solver's optimality gap. Our comparisons against standard integer programming practice show that EvoCut reduces optimality gap by 17-57% within a fixed time. It obtains the same solutions up to 4 times as fast, and obtains higher-quality solutions within the same time limit. Requiring no human expert input, EvoCut reliably generates, improves, and empirically verifies cuts that generalize to unseen instances. The code is available at https://github.com/milad1378yz/EvoCut.
Classifying Inconsistency in AHP Pairwise Comparison Matrices Using Machine Learning
Assessing consistency in Pairwise Comparison Matrices (PCMs) within the Analytical Hierarchy Process (AHP) poses significant challenges when using the traditional Consistency Ratio (CR) method. This study introduces a novel alternative that leverages triadic preference reversals (PR) to provide a more robust and interpretable assessment of consistency. Triadic preference reversals capture inconsistencies between a pair of elements by comparing the direction of preference derived from the global eigenvector with that from a 3x3 submatrix (triad) containing the same pair, highlighting local-global preference conflicts. This method detects a reversal when one eigen ratio exceeds one while another falls below one, signaling inconsistency. We identify two key features: the proportion of preference reversals and the maximum reversal, which mediate the impact of a PCM's order on its consistency. Using these features simulated PCMs are clustered into consistent and inconsistent classes through k-means clustering, followed by training a logistic classifier for consistency evaluation. The PR method achieves 97\% accuracy, significantly surpassing the Consistency Ratio (CR) method's 50%, with a false negative rate of only 2.6\% compared to 5.5\%. These findings demonstrate the PR method's superior accuracy in assessing AHP consistency, thereby enabling more reliable decision-making. The proposed triadic preference reversal (PR) approach is implemented in the R package AHPtools publicly available on the Comprehensive R Archive Network (CRAN).
Pickup & Delivery with Time Windows and Transfers: combining decomposition with metaheuristics
Avgerinos, Ioannis, Mourtos, Ioannis, Tsompanidis, Nikolaos, Zois, Georgios
This paper examines the generalisation of the Pickup and Delivery Problem that allows mid-route load exchanges among vehicles and obeys strict time-windows at all locations. We propose a novel Logic-Based Benders Decomposition (LBBD) that improves optimality gaps for all benchmarks in the literature and scales up to handle larger ones. To tackle even larger instances, we introduce a refined Large Neighborhood Search (LNS) algorithm that improves the adaptability of LNS beyond case-specific configurations appearing in related literature. To bridge the gap in benchmark availability, we develop an instance generator that allows for extensive experimentation. For moderate datasets (25 and 50 requests), we evaluate the performance of both LBBD and LNS, the former being able to close the gap and the latter capable of providing near-optimal solutions. For larger instances (75 and 100 requests), we recreate indicative state-of-the-art metaheuristics to highlight the improvements introduced by our LNS refinements, while establishing its scalability.
Integrating Response Time and Attention Duration in Bayesian Preference Learning for Multiple Criteria Decision Aiding
Jiang, Jiaxuan, Liu, Jiapeng, Kadziński, Miłosz, Liao, Xiuwu, Dong, Jingyu
We introduce a multiple criteria Bayesian preference learning framework incorporating behavioral cues for decision aiding. The framework integrates pairwise comparisons, response time, and attention duration to deepen insights into decision-making processes. The approach employs an additive value function model and utilizes a Bayesian framework to derive the posterior distribution of potential ranking models by defining the likelihood of observed preference data and specifying a prior on the preference structure. This distribution highlights each model's ability to reconstruct Decision-Makers' holistic pairwise comparisons. By leveraging both response time as a proxy for cognitive effort and alternative discriminability as well as attention duration as an indicator of criterion importance, the proposed model surpasses traditional methods by uncovering richer behavioral patterns. We report the results of a laboratory experiment on mobile phone contract selection involving 30 real subjects using a dedicated application with time-, eye-, and mouse-tracking components. We validate the novel method's ability to reconstruct complete preferences. The detailed ablation studies reveal time- and attention-related behavioral patterns, confirming that integrating comprehensive data leads to developing models that better align with the DM's actual preferences.
Strong bounds for large-scale Minimum Sum-of-Squares Clustering
Croella, Anna Livia, Piccialli, Veronica, Sudoso, Antonio M.
Clustering is a fundamental technique in data analysis and machine learning, used to group similar data points together. Among various clustering methods, the Minimum Sum-of-Squares Clustering (MSSC) is one of the most widely used. MSSC aims to minimize the total squared Euclidean distance between data points and their corresponding cluster centroids. Due to the unsupervised nature of clustering, achieving global optimality is crucial, yet computationally challenging. The complexity of finding the global solution increases exponentially with the number of data points, making exact methods impractical for large-scale datasets. Even obtaining strong lower bounds on the optimal MSSC objective value is computationally prohibitive, making it difficult to assess the quality of heuristic solutions. We address this challenge by introducing a novel method to validate heuristic MSSC solutions through optimality gaps. Our approach employs a divide-and-conquer strategy, decomposing the problem into smaller instances that can be handled by an exact solver. The decomposition is guided by an auxiliary optimization problem, the "anticlustering problem", for which we design an efficient heuristic. Computational experiments demonstrate the effectiveness of the method for large-scale instances, achieving optimality gaps below 3% in most cases while maintaining reasonable computational times. These results highlight the practicality of our approach in assessing feasible clustering solutions for large datasets, bridging a critical gap in MSSC evaluation.