Aswani, Anil
Understanding the Impact of Coalitions between EV Charging Stations
Kudva, Sukanya, Kulkarni, Kshitij, Maheshwari, Chinmay, Aswani, Anil, Sastry, Shankar
The rapid growth of electric vehicles (EVs) is driving the expansion of charging infrastructure globally. This expansion, however, places significant charging demand on the electricity grid, impacting grid operations and electricity pricing. While coordination among all charging stations is beneficial, it may not be always feasible. However, a subset of charging stations, which could be jointly operated by a company, could coordinate to decide their charging profile. In this paper we investigate whether such coalitions between charging stations is better than no coordination. We model EV charging as a non-cooperative aggregative game, where each station's cost is determined by both monetary payments tied to reactive electricity prices on the grid and its sensitivity to deviations from a nominal charging profile. We consider a solution concept that we call $\mathcal{C}$-Nash equilibrium, which is tied to a coalition $\mathcal{C}$ of charging stations coordinating to reduce their cumulative costs. We provide sufficient conditions, in terms of the demand and sensitivity of charging stations, to determine when independent (uncoordinated) operation of charging stations could result in lower overall costs to charging stations, the coalition, and charging stations outside the coalition. Somewhat counter to intuition, we demonstrate scenarios where allowing charging stations to operate independently is better than coordinating as a coalition. Jointly, these results provide operators of charging stations insights into how to coordinate their charging behavior, and open several research directions.
Methodology for Interpretable Reinforcement Learning for Optimizing Mechanical Ventilation
Lee, Joo Seung, Mahendra, Malini, Aswani, Anil
Mechanical ventilation is a critical life-support intervention that uses a machine to deliver controlled air and oxygen to a patient's lungs, assisting or replacing spontaneous breathing. While several data-driven approaches have been proposed to optimize ventilator control strategies, they often lack interpretability and agreement with general domain knowledge. This paper proposes a methodology for interpretable reinforcement learning (RL) using decision trees for mechanical ventilation control. Using a causal, nonparametric model-based off-policy evaluation, we evaluate the policies in their ability to gain increases in SpO2 while avoiding aggressive ventilator settings which are known to cause ventilator induced lung injuries and other complications. Numerical experiments using MIMIC-III data on the stays of real patients' intensive care unit stays demonstrate that the decision tree policy outperforms the behavior cloning policy and is comparable to state-of-the-art RL policy. Future work concerns better aligning the cost function with medical objectives to generate deeper clinical insights.
Tensor Completion via Integer Optimization
Chen, Xin, Kudva, Sukanya, Dai, Yongzheng, Aswani, Anil, Chen, Chen
The main challenge with the tensor completion problem is a fundamental tension between computation power and the information-theoretic sample complexity rate. Past approaches either achieve the information-theoretic rate but lack practical algorithms to compute the corresponding solution, or have polynomial-time algorithms that require an exponentially-larger number of samples for low estimation error. This paper develops a novel tensor completion algorithm that resolves this tension by achieving both provable convergence (in numerical tolerance) in a linear number of oracle steps and the information-theoretic rate. Our approach formulates tensor completion as a convex optimization problem constrained using a gauge-based tensor norm, which is defined in a way that allows the use of integer linear optimization to solve linear separation problems over the unit-ball in this new norm. Adaptations based on this insight are incorporated into a Frank-Wolfe variant to build our algorithm. We show our algorithm scales-well using numerical experiments on tensors with up to ten million entries.
Estimating and Incentivizing Imperfect-Knowledge Agents with Hidden Rewards
Dogan, Ilgin, Shen, Zuo-Jun Max, Aswani, Anil
Repeated principal-agent theory is a well-established paradigm that studies sequential interactions between two self-interested decision-makers. In particular, it offers a framework to analyze the problem of a primary party (i.e., principal) in a system who seeks to optimize the ultimate performance of the system by repeatedly delegating some operational control to another strategic party (i.e., agent) with a private decision-making process. This privacy imposes an information asymmetry between the principal and the agent that can appear as either an adverse selection setting, in which the information about the agent's true preferences or rewards are hidden from the principal, or a moral hazard setting, in which the actions chosen by the agent are hidden from the principal (Bolton and Dewatripont 2004). In either case, the principal's problem can be defined along two main dimensions: i) learning some private information about the agent by training a consistent estimator, ii) designing an incentive mechanism to lead the agent's algorithm in favor of the principal. In this paper, we study these two research problems for an unexplored adverse selection setting by marrying the classical principal-agent theory to statistics and reinforcement learning. In a repeated principal-agent game, the main theoretical challenge is sourced from the dynamic and sequential interactions taking place between the two strategic decision-makers. In each play of the game, first the principal offers a menu of incentives to the agent, and then the agent makes a choice from a finite set of actions, which in turn determines the rewards collected by both players. In other words, there is a two-sided sequential externality in this setting, whereby the agent's imperfect knowledge imposes additional costs on the principal and the principal's incentives impose a more challenging decision-making environment for the agent with imperfect knowledge. This paper considers that both the principal and the agent observe stochastic rewards with unknown (to both) expectations, and that both parties aim to maximize their own cumulative expected rewards Dogan et.
Repeated Principal-Agent Games with Unobserved Agent Rewards and Perfect-Knowledge Agents
Dogan, Ilgin, Shen, Zuo-Jun Max, Aswani, Anil
System designers frequently use the idea of providing incentives to stakeholders as a powerful means of steering the stakeholders for their own benefit. Operations management includes many such examples, such as offering performance-based bonuses to ride-hailing drivers, providing monetary incentives to patients for medical adherence, quality-contingent bonus payments for workers in crowdsourcing platforms, and vertical collaboration between shippers and carriers in transportation planning. In many real-world settings, the problem of designing efficient incentives can be posed as a repeated principal-agent problem where a principal (i.e., system designer) designs sequential incentive policies to motivate an agent (i.e., stakeholder) to convey certain behaviors that eventually serve the goal of maximizing the principal's cumulative net reward. Typically, there is an element of information asymmetry in these systems which arises between the principal and the agent in the form of either adverse selection (i.e., hidden information) or moral hazard (i.e., hidden actions) (Bolton and Dewatripont 2004). For instance, in the context of employment incentives designed by an employer, the hidden information in an adverse selection setting could be the level of productivity of an employee whereas a hidden action in the moral hazard setting could be the total effort level of the employee. More generally, the hidden information in the adverse selection setting can be seen as an unknown "type" or "preferences" of the agent that directly affects the action chosen by the agent, which in turn determines both the agent's utility and the principal's reward. These situations require specification of the agent's private information and the distributional-knowledge that the principal has concerning that information. Existing literature on repeated principal-agent models mostly studies the moral hazard setting, with a more recent focus on the problem of estimating agent's unknown model parameters under hidden actions (e.g., Ho et al. 2016, Kaynar and Siddiq 2022). On the other hand, the adverse selection setting is mostly studied either for single-period static games (Navabi and Nayyar 2018, Chade and Swinkels 2019, Gottlieb and Moreira 2022) or else for the repeated dynamic games where restrictive assumptions are made on, for example, dimension of the agent's action space, utility Dogan et.
Accelerated Nonnegative Tensor Completion via Integer Programming
Pan, Wenhao, Aswani, Anil, Chen, Chen
The problem of tensor completion has applications in healthcare, computer vision, and other domains. However, past approaches to tensor completion have faced a tension in that they either have polynomial-time computation but require exponentially more samples than the information-theoretic rate, or they use fewer samples but require solving NP-hard problems for which there are no known practical algorithms. A recent approach, based on integer programming, resolves this tension for nonnegative tensor completion. It achieves the information-theoretic sample complexity rate and deploys the Blended Conditional Gradients algorithm, which requires a linear (in numerical tolerance) number of oracle steps to converge to the global optimum. The tradeoff in this approach is that, in the worst case, the oracle step requires solving an integer linear program. Despite this theoretical limitation, numerical experiments show that this algorithm can, on certain instances, scale up to 100 million entries while running on a personal computer. The goal of this paper is to further enhance this algorithm, with the intention to expand both the breadth and scale of instances that can be solved. We explore several variants that can maintain the same theoretical guarantees as the algorithm, but offer potentially faster computation. We consider different data structures, acceleration of gradient descent steps, and the use of the Blended Pairwise Conditional Gradients algorithm. We describe the original approach and these variants, and conduct numerical experiments in order to explore various tradeoffs in these algorithmic design choices.
Regret Analysis of Learning-Based MPC with Partially-Unknown Cost Function
Dogan, Ilgin, Shen, Zuo-Jun Max, Aswani, Anil
The exploration/exploitation trade-off is an inherent challenge in data-driven adaptive control. Though this trade-off has been studied for multi-armed bandits (MAB's) and reinforcement learning for linear systems; it is less well-studied for learning-based control of nonlinear systems. A significant theoretical challenge in the nonlinear setting is that there is no explicit characterization of an optimal controller for a given set of cost and system parameters. We propose the use of a finite-horizon oracle controller with full knowledge of parameters as a reasonable surrogate to optimal controller. This allows us to develop policies in the context of learning-based MPC and MAB's and conduct a control-theoretic analysis using techniques from MPC- and optimization-theory to show these policies achieve low regret with respect to this finite-horizon oracle. Our simulations exhibit the low regret of our policy on a heating, ventilation, and air-conditioning model with partially-unknown cost function.
Learning from learning machines: a new generation of AI technology to meet the needs of science
Pion-Tonachini, Luca, Bouchard, Kristofer, Martin, Hector Garcia, Peisert, Sean, Holtz, W. Bradley, Aswani, Anil, Dwivedi, Dipankar, Wainwright, Haruko, Pilania, Ghanshyam, Nachman, Benjamin, Marrone, Babetta L., Falco, Nicola, Prabhat, null, Arnold, Daniel, Wolf-Yadlin, Alejandro, Powers, Sarah, Climer, Sharlee, Jackson, Quinn, Carlson, Ty, Sohn, Michael, Zwart, Petrus, Kumar, Neeraj, Justice, Amy, Tomlin, Claire, Jacobson, Daniel, Micklem, Gos, Gkoutos, Georgios V., Bickel, Peter J., Cazier, Jean-Baptiste, Mรผller, Juliane, Webb-Robertson, Bobbie-Jo, Stevens, Rick, Anderson, Mark, Kreutz-Delgado, Ken, Mahoney, Michael W., Brown, James B.
We outline emerging opportunities and challenges to enhance the utility of AI for scientific discovery. The distinct goals of AI for industry versus the goals of AI for science create tension between identifying patterns in data versus discovering patterns in the world from data. If we address the fundamental challenges associated with "bridging the gap" between domain-driven scientific models and data-driven AI learning machines, then we expect that these AI models can transform hypothesis generation, scientific discovery, and the scientific process itself.
Average Margin Regularization for Classifiers
Olfat, Matt, Aswani, Anil
Adversarial robustness has become an important research topic given empirical demonstrations on the lack of robustness of deep neural networks. Unfortunately, recent theoretical results suggest that adversarial training induces a strict tradeoff between classification accuracy and adversarial robustness. In this paper, we propose and then study a new regularization for any margin classifier or deep neural network. We motivate this regularization by a novel generalization bound that shows a tradeoff in classifier accuracy between maximizing its margin and average margin. We thus call our approach an average margin (AM) regularization, and it consists of a linear term added to the objective. We theoretically show that for certain distributions AM regularization can both improve classifier accuracy and robustness to adversarial attacks. We conclude by using both synthetic and real data to empirically show that AM regularization can strictly improve both accuracy and robustness for support vector machine's (SVM's) and deep neural networks, relative to unregularized classifiers and adversarially trained classifiers.
Convex Formulations for Fair Principal Component Analysis
Olfat, Matt, Aswani, Anil
Though there is a growing body of literature on fairness for supervised learning, the problem of incorporating fairness into unsupervised learning has been less well-studied. This paper studies fairness in the context of principal component analysis (PCA). We first present a definition of fairness for dimensionality reduction, and our definition can be interpreted as saying that a reduction is fair if information about a protected class (e.g., race or gender) cannot be inferred from the dimensionality-reduced data points. Next, we develop convex optimization formulations that can improve the fairness (with respect to our definition) of PCA and kernel PCA. These formulations are semidefinite programs (SDP's), and we demonstrate the effectiveness of our formulations using several datasets. We conclude by showing how our approach can be used to perform a fair (with respect to age) clustering of health data that may be used to set health insurance rates.