Muthirayan, Deepan
Long-term Fairness For Real-time Decision Making: A Constrained Online Optimization Approach
Du, Ruijie, Muthirayan, Deepan, Khargonekar, Pramod P., Shen, Yanning
Machine learning (ML) has demonstrated remarkable capabilities across many real-world systems, from predictive modeling to intelligent automation. However, the widespread integration of machine learning also makes it necessary to ensure machine learning-driven decision-making systems do not violate ethical principles and values of society in which they operate. As ML-driven decisions proliferate, particularly in cases involving sensitive attributes such as gender, race, and age, to name a few, the need for equity and impartiality has emerged as a fundamental concern. In situations demanding real-time decision-making, fairness objectives become more nuanced and complex: instantaneous fairness to ensure equity in every time slot, and long-term fairness to ensure fairness over a period of time. There is a growing awareness that real-world systems that operate over long periods and require fairness over different timelines. However, existing approaches mainly address dynamic costs with time-invariant fairness constraints, often disregarding the challenges posed by time-varying fairness constraints. To bridge this gap, this work introduces a framework for ensuring long-term fairness within dynamic decision-making systems characterized by time-varying fairness constraints. We formulate the decision problem with fairness constraints over a period as a constrained online optimization problem. A novel online algorithm, named LoTFair, is presented that solves the problem 'on the fly'. We prove that LoTFair can make overall fairness violations negligible while maintaining the performance over the long run.
Online Learning for Incentive-Based Demand Response
Muthirayan, Deepan, Khargonekar, Pramod P.
In this paper, we consider the problem of learning online to manage Demand Response (DR) resources. A typical DR mechanism requires the DR manager to assign a baseline to the participating consumer, where the baseline is an estimate of the counterfactual consumption of the consumer had it not been called to provide the DR service. A challenge in estimating baseline is the incentive the consumer has to inflate the baseline estimate. We consider the problem of learning online to estimate the baseline and to optimize the operating costs over a period of time under such incentives. We propose an online learning scheme that employs least-squares for estimation with a perturbation to the reward price (for the DR services or load curtailment) that is designed to balance the exploration and exploitation trade-off that arises with online learning. We show that, our proposed scheme is able to achieve a very low regret of $\mathcal{O}\left((\log{T})^2\right)$ with respect to the optimal operating cost over $T$ days of the DR program with full knowledge of the baseline, and is individually rational for the consumers to participate. Our scheme is significantly better than the averaging type approach, which only fetches $\mathcal{O}(T^{1/3})$ regret.
Competing Bandits in Time Varying Matching Markets
Muthirayan, Deepan, Maheshwari, Chinmay, Khargonekar, Pramod P., Sastry, Shankar
We study the problem of online learning in two-sided non-stationary matching markets, where the objective is to converge to a stable match. In particular, we consider the setting where one side of the market, the arms, has fixed known set of preferences over the other side, the players. While this problem has been studied when the players have fixed but unknown preferences, in this work we study the problem of how to learn when the preferences of the players are time varying and unknown. Our contribution is a methodology that can handle any type of preference structure and variation scenario. We show that, with the proposed algorithm, each player receives a uniform sub-linear regret of {$\widetilde{\mathcal{O}}(L^{1/2}_TT^{1/2})$} up to the number of changes in the underlying preferences of the agents, $L_T$. Therefore, we show that the optimal rates for single-agent learning can be achieved in spite of the competition up to a difference of a constant factor. We also discuss extensions of this algorithm to the case where the number of changes need not be known a priori.
Graph Learning for Cognitive Digital Twins in Manufacturing Systems
Mortlock, Trier, Muthirayan, Deepan, Yu, Shih-Yuan, Khargonekar, Pramod P., Faruque, Mohammad A. Al
Future manufacturing requires complex systems that connect simulation platforms and virtualization with physical data from industrial processes. Digital twins incorporate a physical twin, a digital twin, and the connection between the two. Benefits of using digital twins, especially in manufacturing, are abundant as they can increase efficiency across an entire manufacturing life-cycle. The digital twin concept has become increasingly sophisticated and capable over time, enabled by rises in many technologies. In this paper, we detail the cognitive digital twin as the next stage of advancement of a digital twin that will help realize the vision of Industry 4.0. Cognitive digital twins will allow enterprises to creatively, effectively, and efficiently exploit implicit knowledge drawn from the experience of existing manufacturing systems. They also enable more autonomous decisions and control, while improving the performance across the enterprise (at scale). This paper presents graph learning as one potential pathway towards enabling cognitive functionalities in manufacturing digital twins. A novel approach to realize cognitive digital twins in the product design stage of manufacturing that utilizes graph learning is presented.
A Meta-Learning Control Algorithm with Provable Finite-Time Guarantees
Muthirayan, Deepan, Khargonekar, Pramod
In this work we provide provable regret guarantees for an online meta-learning control algorithm in an iterative control setting, where in each iteration the system to be controlled is a linear deterministic system that is different and unknown, the cost for the controller in an iteration is a general additive cost function and the control input is required to be constrained, which if violated incurs an additional cost. We prove (i) that the algorithm achieves a regret for the controller cost and constraint violation that are $O(T^{3/4})$ for an episode of duration $T$ with respect to the best policy that satisfies the control input control constraints and (ii) that the average of the regret for the controller cost and constraint violation with respect to the same policy vary as $O((1+\log(N)/N)T^{3/4})$ with the number of iterations $N$, showing that the worst regret for the learning within an iteration continuously improves with experience of more iterations.