Khan, Md. Mosaddek
A Graph Neural Network-Based QUBO-Formulated Hamiltonian-Inspired Loss Function for Combinatorial Optimization using Reinforcement Learning
Rizvee, Redwan Ahmed, Hassan, Raheeb, Khan, Md. Mosaddek
Quadratic Unconstrained Binary Optimization (QUBO) is a generic technique to model various NP-hard Combinatorial Optimization problems (CO) in the form of binary variables. Ising Hamiltonian is used to model the energy function of a system. QUBO to Ising Hamiltonian is regarded as a technique to solve various canonical optimization problems through quantum optimization algorithms. Recently, PI-GNN, a generic framework, has been proposed to address CO problems over graphs based on Graph Neural Network (GNN) architecture. They introduced a generic QUBO-formulated Hamiltonian-inspired loss function that was directly optimized using GNN. PI-GNN is highly scalable but there lies a noticeable decrease in the number of satisfied constraints when compared to problem-specific algorithms and becomes more pronounced with increased graph densities. Here, We identify a behavioral pattern related to it and devise strategies to improve its performance. Another group of literature uses Reinforcement learning (RL) to solve the aforementioned NP-hard problems using problem-specific reward functions. In this work, we also focus on creating a bridge between the RL-based solutions and the QUBO-formulated Hamiltonian. We formulate and empirically evaluate the compatibility of the QUBO-formulated Hamiltonian as the generic reward function in the RL-based paradigm in the form of rewards. Furthermore, we also introduce a novel Monty Carlo Tree Search-based strategy with GNN where we apply a guided search through manual perturbation of node labels during training. We empirically evaluated our methods and observed up to 44% improvement in the number of constraint violations compared to the PI-GNN.
DePAint: A Decentralized Safe Multi-Agent Reinforcement Learning Algorithm considering Peak and Average Constraints
Hassan, Raheeb, Wadith, K. M. Shadman, Rashid, Md. Mamun or, Khan, Md. Mosaddek
The field of safe multi-agent reinforcement learning, despite its potential applications in various domains such as drone delivery and vehicle automation, remains relatively unexplored. Training agents to learn optimal policies that maximize rewards while considering specific constraints can be challenging, particularly in scenarios where having a central controller to coordinate the agents during the training process is not feasible. In this paper, we address the problem of multi-agent policy optimization in a decentralized setting, where agents communicate with their neighbors to maximize the sum of their cumulative rewards while also satisfying each agent's safety constraints. We consider both peak and average constraints. In this scenario, there is no central controller coordinating the agents and both the rewards and constraints are only known to each agent locally/privately. We formulate the problem as a decentralized constrained multi-agent Markov Decision Problem and propose a momentum-based decentralized policy gradient method, DePAint, to solve it. To the best of our knowledge, this is the first privacy-preserving fully decentralized multi-agent reinforcement learning algorithm that considers both peak and average constraints. We also provide theoretical analysis and empirical evaluation of our algorithm in various scenarios and compare its performance to centralized algorithms that consider similar constraints.
A Graph Neural Network-Based QUBO-Formulated Hamiltonian-Inspired Loss Function for Combinatorial Optimization using Reinforcement Learning
Rizvee, Redwan Ahmed, Khan, Md. Mosaddek
Quadratic Unconstrained Binary Optimization (QUBO) is a generic technique to model various NP-hard combinatorial optimization problems in the form of binary variables. The Hamiltonian function is often used to formulate QUBO problems where it is used as the objective function in the context of optimization. Recently, PI-GNN, a generic scalable framework, has been proposed to address the Combinatorial Optimization (CO) problems over graphs based on a simple Graph Neural Network (GNN) architecture. Their novel contribution was a generic QUBO-formulated Hamiltonian-inspired loss function that was optimized using GNN. In this study, we address a crucial issue related to the aforementioned setup especially observed in denser graphs. The reinforcement learning-based paradigm has also been widely used to address numerous CO problems. Here we also formulate and empirically evaluate the compatibility of the QUBO-formulated Hamiltonian as the generic reward function in the Reinforcement Learning paradigm to directly integrate the actual node projection status during training as the form of rewards. In our experiments, we observed up to 44% improvement in the RL-based setup compared to the PI-GNN algorithm. Our implementation can be found in https://github.com/rizveeredwan/learning-graph-structure.
Improving Causal Effect Estimation of Weighted RegressionBased Estimator using Neural Networks
Shaha, Plabon, Zadid, Talha Islam, Rahman, Ismat, Khan, Md. Mosaddek
The do-calculus is a set of inference directives that helps the transformation of these interventions into more interpretable Estimating causal effects from observational data informs us about probabilistic sentences, and as such, enables an user to derive or which factors are important in an autonomous system, and enables confirm causal claims about interventions [14]. Results inferred us to take better decisions. This is important because it has applications from do-calculus is well understood on the whole but its application in selecting a treatment in medical systems or making is still questionable [10]. This is because do-calculus assumes that better strategies in industries or making better policies for our the distributions being used are error-free, but in practice, we do not government or even the society. Unavailability of complete data, have sufficient samples to confirm that. In case of limited samples, coupled with high cardinality of data, makes this estimation task a popular criterion, namely back-door criterion, is employed to computationally intractable.
Learning Cooperation and Online Planning Through Simulation and Graph Convolutional Network
Mahmud, Rafid Ameer, Faisal, Fahim, Mahmud, Saaduddin, Khan, Md. Mosaddek
Multi-agent Markov Decision Process (MMDP) has been an effective way of modelling sequential decision making algorithms for multi-agent cooperative environments. A number of algorithms based on centralized and decentralized planning have been developed in this domain. However, dynamically changing environment, coupled with exponential size of the state and joint action space, make it difficult for these algorithms to provide both efficiency and scalability. Recently, Centralized planning algorithm FV-MCTS-MP and decentralized planning algorithm \textit{Alternate maximization with Behavioural Cloning} (ABC) have achieved notable performance in solving MMDPs. However, they are not capable of adapting to dynamically changing environments and accounting for the lack of communication among agents, respectively. Against this background, we introduce a simulation based online planning algorithm, that we call SiCLOP, for multi-agent cooperative environments. Specifically, SiCLOP tailors Monte Carlo Tree Search (MCTS) and uses Coordination Graph (CG) and Graph Neural Network (GCN) to learn cooperation and provides real time solution of a MMDP problem. It also improves scalability through an effective pruning of action space. Additionally, unlike FV-MCTS-MP and ABC, SiCLOP supports transfer learning, which enables learned agents to operate in different environments. We also provide theoretical discussion about the convergence property of our algorithm within the context of multi-agent settings. Finally, our extensive empirical results show that SiCLOP significantly outperforms the state-of-the-art online planning algorithms.
Accelerating Recursive Partition-Based Causal Structure Learning
Rahman, Md. Musfiqur, Rasheed, Ayman, Khan, Md. Mosaddek, Javidian, Mohammad Ali, Jamshidi, Pooyan, Mamun-Or-Rashid, Md.
Causal structure discovery from observational data is fundamental to the causal understanding of autonomous systems such as medical decision support systems, advertising campaigns and self-driving cars. This is essential to solve well-known causal decision making and prediction problems associated with those real-world applications. Recently, recursive causal discovery algorithms have gained particular attention among the research community due to their ability to provide good results by using Conditional Independent (CI) tests in smaller sub-problems. However, each of such algorithms needs a refinement function to remove undesired causal relations of the discovered graphs. Notably, with the increase of the problem size, the computation cost (i.e., the number of CI-tests) of the refinement function makes an algorithm expensive to deploy in practice. This paper proposes a generic causal structure refinement strategy that can locate the undesired relations with a small number of CI-tests, thus speeding up the algorithm for large and complex problems. We theoretically prove the correctness of our algorithm. We then empirically evaluate its performance against the state-of-the-art algorithms in terms of solution quality and completion time in synthetic and real datasets.
Improving Solution Quality of Bounded Max-Sum Algorithm to Solve DCOPs involving Hard and Soft Constraints
Rahman, Md. Musfiqur, Rashik, Mashrur, Mamun-or-Rashid, Md., Khan, Md. Mosaddek
Bounded Max-Sum (BMS) is a message-passing algorithm that provides approximation solution to a specific form of de-centralized coordination problems, namely Distributed Constrained Optimization Problems (DCOPs). In particular, BMS algorithm is able to solve problems of this type having large search space at the expense of low computational cost. Notably, the traditional DCOP formulation does not consider those constraints that must be satisfied(also known as hard constraints), rather it concentrates only on soft constraints. Hence, although the presence of both types of constraints are observed in a number of real-world applications, the BMS algorithm does not actively capitalize on the hard constraints. To address this issue, we tailor BMS in such a way that can deal with DCOPs having both type constraints. In so doing, our approach improves the solution quality of the algorithm. The empirical results exhibit a marked improvement in the quality of the solutions of large DCOPs.
A Particle Swarm Inspired Approach for Continuous Distributed Constraint Optimization Problems
Choudhury, Moumita, Sarker, Amit, Khan, Md. Mosaddek, Yeoh, William
Distributed Constraint Optimization Problems (DCOPs) are a widely studied framework for coordinating interactions in cooperative multi-agent systems. In classical DCOPs, variables owned by agents are assumed to be discrete. However, in many applications, such as target tracking or sleep scheduling in sensor networks, continuous-valued variables are more suitable than discrete ones. To better model such applications, researchers have proposed Continuous DCOPs (C-DCOPs), an extension of DCOPs, that can explicitly model problems with continuous variables. The state-of-the-art approaches for solving C-DCOPs experience either onerous memory or computation overhead and unsuitable for non-differentiable optimization problems. To address this issue, we propose a new C-DCOP algorithm, namely Particle Swarm Optimization Based C-DCOP (PCD), which is inspired by Particle Swarm Optimization (PSO), a well-known centralized population-based approach for solving continuous optimization problems. In recent years, population-based algorithms have gained significant attention in classical DCOPs due to their ability in producing high-quality solutions. Nonetheless, to the best of our knowledge, this class of algorithms has not been utilized to solve C-DCOPs and there has been no work evaluating the potential of PSO in solving classical DCOPs or C-DCOPs. In light of this observation, we adapted PSO, a centralized algorithm, to solve C-DCOPs in a decentralized manner. The resulting PCD algorithm not only produces good-quality solutions but also finds solutions without any requirement for derivative calculations. Moreover, we design a crossover operator that can be used by PCD to further improve the quality of solutions found. Finally, we theoretically prove that PCD is an anytime algorithm and empirically evaluate PCD against the state-of-the-art C-DCOP algorithms in a wide variety of benchmarks.
On Population-Based Algorithms for Distributed Constraint Optimization Problems
Mahmud, Saaduddin, Khan, Md. Mosaddek, Jennings, Nicholas R.
Distributed Constraint Optimization Problems (DCOPs) are a widely studied class of optimization problems in which interaction between a set of cooperative agents are modeled as a set of constraints. DCOPs are NP-hard and significant effort has been devoted to developing methods for finding incomplete solutions. In this paper, we study an emerging class of such incomplete algorithms that are broadly termed as population-based algorithms. The main characteristic of these algorithms is that they maintain a population of candidate solutions of a given problem and use this population to cover a large area of the search space and to avoid local-optima. In recent years, this class of algorithms has gained significant attention due to their ability to produce high-quality incomplete solutions. With the primary goal of further improving the quality of solutions compared to the state-of-the-art incomplete DCOP algorithms, we present two new population-based algorithms in this paper. Our first approach, Anytime Evolutionary DCOP or AED, exploits evolutionary optimization meta-heuristics to solve DCOPs. We also present a novel anytime update mechanism that gives AED its anytime property. While in our second contribution, we show that population-based approaches can be combined with local search approaches. Specifically, we develop an algorithm called DPSA based on the Simulated Annealing meta-heuristic. We empirically evaluate these two algorithms to illustrate their respective effectiveness in different settings against the state-of-the-art incomplete DCOP algorithms including all existing population-based algorithms in a wide variety of benchmarks. Our evaluation shows AED and DPSA markedly outperform the state-of-the-art and produce up to 75% improved solutions.
Speeding Up Distributed Pseudo-tree Optimization Procedure with Cross Edge Consistency to Solve DCOPs
Rashik, Mashrur, Rahman, Md. Musfiqur, Mamun-or-Rashid, Md., Khan, Md. Mosaddek
Distributed Pseudo-tree Optimization Procedure (DPOP) is a well-known message passing algorithm that has been used to provide optimal solutions of Distributed Constraint Optimization Problems (DCOPs) -- a framework that is designed to optimize constraints in cooperative multi-agent systems. The traditional DCOP formulation does not consider those constraints that must be satisfied (also known as hard constraints), rather it concentrates only on soft constraints. However, the presence of both types of constraints are observed in a number of applications, such as Distributed Radio Link Frequency Assignment and Distributed Event Scheduling, etc. Although the combination of these types of constraints is recently incorporated in DPOP to solve DCOPs, scalability remains an issue for them as finding an optimal solution is NP-hard. Additionally, in DPOP, the agents are arranged as a DFS pseudo-tree. Recently it has been observed that the constructed pseudo-trees in this way often come to be chain-like and greatly impair the algorithm's performance. To address these issues, we develop an algorithm that speeds up the DPOP algorithm by reducing the size of the messages exchanged and increasing parallelism in the pseudo tree. Our empirical evidence suggests that our approach outperforms the state-of-the-art algorithms by a significant margin.