Optimization
Hyper-local sustainable assortment planning
Aggarwal, Nupur, Bansal, Abhishek, Manglik, Kushagra, Kulkarni, Kedar, Raykar, Vikas
Assortment planning, an important seasonal activity for any retailer, involves choosing the right subset of products to stock in each store.While existing approaches only maximize the expected revenue, we propose including the environmental impact too, through the Higg Material Sustainability Index. The trade-off between revenue and environmental impact is balanced through a multi-objective optimization approach, that yields a Pareto-front of optimal assortments for merchandisers to choose from. Using the proposed approach on a few product categories of a leading fashion retailer shows that choosing assortments with lower environmental impact with a minimal impact on revenue is possible.
Off-policy Evaluation in Infinite-Horizon Reinforcement Learning with Latent Confounders
Bennett, Andrew, Kallus, Nathan, Li, Lihong, Mousavi, Ali
A fundamental question in offline reinforcement learning (RL) is how to estimate the value of some target evaluation policy, defined as the long-run average reward obtained by following the policy, using data logged by running a different behavior policy. This question, known as off-policy evaluation (OPE), often arises in applications such as healthcare, education, or robotics, where experimenting with running the target policy can be expensive or even impossible, but we have data logged following business as usual or current standards of care. A central concern using such passively observed data is that observed actions, rewards, and transitions may be confounded by unobserved variables, which can bias standard OPE methods that assume no unobserved confounders, or equivalently that a standard Markov decision process (MDP) model holds with fully observed state. Consider for example evaluating a new smart-phone app to help people living with type-1 diabetes time their insulin injections by monitoring their blood glucose level using some wearable device. Rather than risking giving bad advice that may harm individuals, we may consider first evaluating our injection-timing policy using existing longitudinal observations of individuals' blood glucose levels over time and the timing of insulin injections.
A Review on Computational Intelligence Techniques in Cloud and Edge Computing
Asim, Muhammad, Wang, Yong, Wang, Kezhi, Huang, Pei-Qiu
Cloud computing (CC) is a centralized computing paradigm that accumulates resources centrally and provides these resources to users through Internet. Although CC holds a large number of resources, it may not be acceptable by real-time mobile applications, as it is usually far away from users geographically. On the other hand, edge computing (EC), which distributes resources to the network edge, enjoys increasing popularity in the applications with low-latency and high-reliability requirements. EC provides resources in a decentralized manner, which can respond to users' requirements faster than the normal CC, but with limited computing capacities. As both CC and EC are resource-sensitive, several big issues arise, such as how to conduct job scheduling, resource allocation, and task offloading, which significantly influence the performance of the whole system. To tackle these issues, many optimization problems have been formulated. These optimization problems usually have complex properties, such as non-convexity and NP-hardness, which may not be addressed by the traditional convex optimization-based solutions. Computational intelligence (CI), consisting of a set of nature-inspired computational approaches, recently exhibits great potential in addressing these optimization problems in CC and EC. This paper provides an overview of research problems in CC and EC and recent progresses in addressing them with the help of CI techniques. Informative discussions and future research trends are also presented, with the aim of offering insights to the readers and motivating new research directions.
Intelligent Optimization of Diversified Community Prevention of COVID-19 using Traditional Chinese Medicine
Zheng, Yu-Jun, Yu, Si-Lan, Yang, Jun-Chao, Gan, Tie-Er, Song, Qin, Yang, Jun, Karatas, Mumtaz
Traditional Chinese medicine (TCM) has played an important role in the prevention and control of the novel coronavirus pneumonia (COVID-19), and community prevention has become the most essential part in reducing the spread risk and protecting populations. However, most communities use a uniform TCM prevention program for all residents, which violates the "treatment based on syndrome differentiation" principle of TCM and limits the effectiveness of prevention. In this paper, we propose an intelligent optimization method to develop diversified TCM prevention programs for community residents. First, we use a fuzzy clustering method to divide the population based on both modern medicine and TCM health characteristics; we then use an interactive optimization method, in which TCM experts develop different TCM prevention programs for different clusters, and a heuristic algorithm is used to optimize the programs under the resource constraints. We demonstrate the computational efficiency of the proposed method and report its successful application to TCM-based prevention of COVID-19 in 12 communities in Zhejiang province, China, during the peak of the pandemic.
Additive Tensor Decomposition Considering Structural Data Information
Mou, Shancong, Wang, Andi, Zhang, Chuck, Shi, Jianjun
Tensor data with rich structural information becomes increasingly important in process modeling, monitoring, and diagnosis. Here structural information is referred to structural properties such as sparsity, smoothness, low-rank, and piecewise constancy. To reveal useful information from tensor data, we propose to decompose the tensor into the summation of multiple components based on different structural information of them. In this paper, we provide a new definition of structural information in tensor data. Based on it, we propose an additive tensor decomposition (ATD) framework to extract useful information from tensor data. This framework specifies a high dimensional optimization problem to obtain the components with distinct structural information. An alternating direction method of multipliers (ADMM) algorithm is proposed to solve it, which is highly parallelable and thus suitable for the proposed optimization problem. Two simulation examples and a real case study in medical image analysis illustrate the versatility and effectiveness of the ATD framework.
Sequential design of multi-fidelity computer experiments: maximizing the rate of stepwise uncertainty reduction
Stroh, Rémi, Bect, Julien, Demeyer, Séverine, Fischer, Nicolas, Marquis, Damien, Vazquez, Emmanuel
This article deals with the sequential design of experiments for (deterministic or stochastic) multi-fidelity numerical simulators, that is, simulators that offer control over the accuracy of simulation of the physical phenomenon or system under study. Very often, accurate simulations correspond to high computational efforts whereas coarse simulations can be obtained at a smaller cost. In this setting, simulation results obtained at several levels of fidelity can be combined in order to estimate quantities of interest (the optimal value of the output, the probability that the output exceeds a given threshold...) in an efficient manner. To do so, we propose a new Bayesian sequential strategy called Maximal Rate of Stepwise Uncertainty Reduction (MR-SUR), that selects additional simulations to be performed by maximizing the ratio between the expected reduction of uncertainty and the cost of simulation. This generic strategy unifies several existing methods, and provides a principled approach to develop new ones. We assess its performance on several examples, including a computationally intensive problem of fire safety analysis where the quantity of interest is the probability of exceeding a tenability threshold during a building fire.
Binary Search and First Order Gradient Based Method for Stochastic Optimization
In this paper, we present a novel stochastic optimization method, which uses the binary search technique with first order gradient based optimization method, called Binary Search Gradient Optimization (BSG) or BiGrad. In this optimization setup, a non-convex surface is treated as a set of convex surfaces. In BSG, at first, a region is defined, assuming region is convex. If region is not convex, then the algorithm leaves the region very fast and defines a new one, otherwise, it tries to converge at the optimal point of the region. In BSG, core purpose of binary search is to decide, whether region is convex or not in logarithmic time, whereas, first order gradient based method is primarily applied, to define a new region. In this paper, Adam is used as a first order gradient based method, nevertheless, other methods of this class may also be considered. In deep neural network setup, it handles the problem of vanishing and exploding gradient efficiently. We evaluate BSG on the MNIST handwritten digit, IMDB, and CIFAR10 data set, using logistic regression and deep neural networks. We produce more promising results as compared to other first order gradient based optimization methods. Furthermore, proposed algorithm generalizes significantly better on unseen data as compared to other methods.
Train Like a (Var)Pro: Efficient Training of Neural Networks with Variable Projection
Newman, Elizabeth, Ruthotto, Lars, Hart, Joseph, Waanders, Bart van Bloemen
Deep neural networks (DNNs) have achieved state-of-the-art performance across a variety of traditional machine learning tasks, e.g., speech recognition, image classification, and segmentation. The ability of DNNs to efficiently approximate high-dimensional functions has also motivated their use in scientific applications, e.g., to solve partial differential equations (PDE) and to generate surrogate models. In this paper, we consider the supervised training of DNNs, which arises in many of the above applications. We focus on the central problem of optimizing the weights of the given DNN such that it accurately approximates the relation between observed input and target data. Devising effective solvers for this optimization problem is notoriously challenging due to the large number of weights, non-convexity, data-sparsity, and non-trivial choice of hyperparameters. To solve the optimization problem more efficiently, we propose the use of variable projection (VarPro), a method originally designed for separable nonlinear least-squares problems. Our main contribution is the Gauss-Newton VarPro method (GNvpro) that extends the reach of the VarPro idea to non-quadratic objective functions, most notably, cross-entropy loss functions arising in classification. These extensions make GNvpro applicable to all training problems that involve a DNN whose last layer is an affine mapping, which is common in many state-of-the-art architectures. In numerical experiments from classification and surrogate modeling, GNvpro not only solves the optimization problem more efficiently but also yields DNNs that generalize better than commonly-used optimization schemes.
Bounded Fuzzy Possibilistic Method of Critical Objects Processing in Machine Learning
Unsatisfying accuracy of learning methods is mostly caused by omitting the influence of important parameters such as membership assignments, type of data objects, and distance or similarity functions. The proposed method, called Bounded Fuzzy Possibilistic Method (BFPM) addresses different issues that previous clustering or classification methods have not sufficiently considered in their membership assignments. In fuzzy methods, the object's memberships should sum to 1. Hence, any data object may obtain full membership in at most one cluster or class. Possibilistic methods relax this condition, but the method can be satisfied with the results even if just an arbitrary object obtains the membership from just one cluster, which prevents the objects' movement analysis. Whereas, BFPM differs from previous fuzzy and possibilistic approaches by removing these restrictions. Furthermore, BFPM provides the flexible search space for objects' movement analysis. Data objects are also considered as fundamental keys in learning methods, and knowing the exact type of objects results in providing a suitable environment for learning algorithms. The Thesis introduces a new type of object, called critical, as well as categorizing data objects into two different categories: structural-based and behavioural-based. Critical objects are considered as causes of miss-classification and miss-assignment in learning procedures. The Thesis also proposes new methodologies to study the behaviour of critical objects with the aim of evaluating objects' movements (mutation) from one cluster or class to another. The Thesis also introduces a new type of feature, called dominant, that is considered as one of the causes of miss-classification and miss-assignments. Then the Thesis proposes new sets of similarity functions, called Weighted Feature Distance (WFD) and Prioritized Weighted Feature Distance (PWFD).
Robust Collective Classification against Structural Attacks
Zhou, Kai, Vorobeychik, Yevgeniy
Collective learning methods exploit relations among data points to enhance classification performance. However, such relations, represented as edges in the underlying graphical model, expose an extra attack surface to the adversaries. We study adversarial robustness of an important class of such graphical models, Associative Markov Networks (AMN), to structural attacks, where an attacker can modify the graph structure at test time. We formulate the task of learning a robust AMN classifier as a bi-level program, where the inner problem is a challenging non-linear integer program that computes optimal structural changes to the AMN. To address this technical challenge, we first relax the attacker problem, and then use duality to obtain a convex quadratic upper bound for the robust AMN problem. We then prove a bound on the quality of the resulting approximately optimal solutions, and experimentally demonstrate the efficacy of our approach. Finally, we apply our approach in a transductive learning setting, and show that robust AMN is much more robust than state-of-the-art deep learning methods, while sacrificing little in accuracy on non-adversarial data.