Goto

Collaborating Authors

 Optimization


To Talk or to Work: Flexible Communication Compression for Energy Efficient Federated Learning over Heterogeneous Mobile Edge Devices

arXiv.org Artificial Intelligence

Recent advances in machine learning, wireless communication, and mobile hardware technologies promisingly enable federated learning (FL) over massive mobile edge devices, which opens new horizons for numerous intelligent mobile applications. Despite the potential benefits, FL imposes huge communication and computation burdens on participating devices due to periodical global synchronization and continuous local training, raising great challenges to battery constrained mobile devices. In this work, we target at improving the energy efficiency of FL over mobile edge networks to accommodate heterogeneous participating devices without sacrificing the learning performance. To this end, we develop a convergence-guaranteed FL algorithm enabling flexible communication compression. Guided by the derived convergence bound, we design a compression control scheme to balance the energy consumption of local computing (i.e., "working") and wireless communication (i.e., "talking") from the long-term learning perspective. In particular, the compression parameters are elaborately chosen for FL participants adapting to their computing and communication environments. Extensive simulations are conducted using various datasets to validate our theoretical analysis, and the results also demonstrate the efficacy of the proposed scheme in energy saving.


Off-Policy Optimization of Portfolio Allocation Policies under Constraints

arXiv.org Artificial Intelligence

The dynamic portfolio optimization problem in finance frequently requires learning policies that adhere to various constraints, driven by investor preferences and risk. We motivate this problem of finding an allocation policy within a sequential decision making framework and study the effects of: (a) using data collected under previously employed policies, which may be sub-optimal and constraint-violating, and (b) imposing desired constraints while computing near-optimal policies with this data. Our framework relies on solving a minimax objective, where one player evaluates policies via off-policy estimators, and the opponent uses an online learning strategy to control constraint violations. We extensively investigate various choices for off-policy estimation and their corresponding optimization sub-routines, and quantify their impact on computing constraint-aware allocation policies. Our study shows promising results for constructing such policies when back-tested on historical equities data, under various regimes of operation, dimensionality and constraints.


Ordered Counterfactual Explanation by Mixed-Integer Linear Optimization

arXiv.org Machine Learning

Post-hoc explanation methods for machine learning models have been widely used to support decision-making. One of the popular methods is Counterfactual Explanation (CE), which provides a user with a perturbation vector of features that alters the prediction result. Given a perturbation vector, a user can interpret it as an "action" for obtaining one's desired decision result. In practice, however, showing only a perturbation vector is often insufficient for users to execute the action. The reason is that if there is an asymmetric interaction among features, such as causality, the total cost of the action is expected to depend on the order of changing features. Therefore, practical CE methods are required to provide an appropriate order of changing features in addition to a perturbation vector. For this purpose, we propose a new framework called Ordered Counterfactual Explanation (OrdCE). We introduce a new objective function that evaluates a pair of an action and an order based on feature interaction. To extract an optimal pair, we propose a mixed-integer linear optimization approach with our objective function. Numerical experiments on real datasets demonstrated the effectiveness of our OrdCE in comparison with unordered CE methods.


Variational Transport: A Convergent Particle-BasedAlgorithm for Distributional Optimization

arXiv.org Machine Learning

We consider the optimization problem of minimizing a functional defined over a family of probability distributions, where the objective functional is assumed to possess a variational form. Such a distributional optimization problem arises widely in machine learning and statistics, with Monte-Carlo sampling, variational inference, policy optimization, and generative adversarial network as examples. For this problem, we propose a novel particle-based algorithm, dubbed as variational transport, which approximately performs Wasserstein gradient descent over the manifold of probability distributions via iteratively pushing a set of particles. Specifically, we prove that moving along the geodesic in the direction of functional gradient with respect to the second-order Wasserstein distance is equivalent to applying a pushforward mapping to a probability distribution, which can be approximated accurately by pushing a set of particles. Specifically, in each iteration of variational transport, we first solve the variational problem associated with the objective functional using the particles, whose solution yields the Wasserstein gradient direction. Then we update the current distribution by pushing each particle along the direction specified by such a solution. By characterizing both the statistical error incurred in estimating the Wasserstein gradient and the progress of the optimization algorithm, we prove that when the objective function satisfies a functional version of the Polyak-\L{}ojasiewicz (PL) (Polyak, 1963) and smoothness conditions, variational transport converges linearly to the global minimum of the objective functional up to a certain statistical error, which decays to zero sublinearly as the number of particles goes to infinity.


Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box Optimization Framework

arXiv.org Machine Learning

In this work, we focus on the study of stochastic zeroth-order (ZO) optimization which does not require first-order gradient information and uses only function evaluations. The problem of ZO optimization has emerged in many recent machine learning applications, where the gradient of the objective function is either unavailable or difficult to compute. In such cases, we can approximate the full gradients or stochastic gradients through function value based gradient estimates. Here, we propose a novel hybrid gradient estimator (HGE), which takes advantage of the query-efficiency of random gradient estimates as well as the variance-reduction of coordinate-wise gradient estimates. We show that with a graceful design in coordinate importance sampling, the proposed HGE-based ZO optimization method is efficient both in terms of iteration complexity as well as function query cost. We provide a thorough theoretical analysis of the convergence of our proposed method for non-convex, convex, and strongly-convex optimization. We show that the convergence rate that we derive generalizes the results for some prominent existing methods in the nonconvex case, and matches the optimal result in the convex case. We also corroborate the theory with a real-world black-box attack generation application to demonstrate the empirical advantage of our method over state-of-the-art ZO optimization approaches.


SPlit: An Optimal Method for Data Splitting

arXiv.org Machine Learning

For developing statistical and machine learning models, it is common to split the dataset into two parts: training and testing (Stone, 1974; Hastie et al., 2009). The training part is used for fitting the model, that is, to estimate the unknown parameters in the model. The model is then evaluated for its accuracy using the testing dataset. The reason for doing this is because if we were to use the entire dataset for fitting, the model would overfit the data and can lead to poor predictions in future scenarios. Therefore, holding out a portion of the dataset and testing the model for its performance before deploying it in the field can protect against unexpected issues that can arise due to overfitting. In this article we consider only datasets where each row is independent, that is, we will exclude cases such as time series data. The simplest and probably the most common strategy to split such a dataset is to randomly sample a fraction of the dataset.


Uncertainty-Aware Policy Optimization: A Robust, Adaptive Trust Region Approach

arXiv.org Machine Learning

In order for reinforcement learning techniques to be useful in real-world decision making processes, they must be able to produce robust performance from limited data. Deep policy optimization methods have achieved impressive results on complex tasks, but their real-world adoption remains limited because they often require significant amounts of data to succeed. When combined with small sample sizes, these methods can result in unstable learning due to their reliance on high-dimensional sample-based estimates. In this work, we develop techniques to control the uncertainty introduced by these estimates. We leverage these techniques to propose a deep policy optimization approach designed to produce stable performance even when data is scarce. The resulting algorithm, Uncertainty-Aware Trust Region Policy Optimization, generates robust policy updates that adapt to the level of uncertainty present throughout the learning process.


Influence Maximization Under Generic Threshold-based Non-submodular Model

arXiv.org Artificial Intelligence

As a widely observable social effect, influence diffusion refers to a process where innovations, trends, awareness, etc. spread across the network via the social impact among individuals. Motivated by such social effect, the concept of influence maximization is coined, where the goal is to select a bounded number of the most influential nodes (seed nodes) from a social network so that they can jointly trigger the maximal influence diffusion. A rich body of research in this area is performed under statistical diffusion models with provable submodularity, which essentially simplifies the problem as the optimal result can be approximated by the simple greedy search. When the diffusion models are non-submodular, however, the research community mostly focuses on how to bound/approximate them by tractable submodular functions so as to estimate the optimal result. In other words, there is still a lack of efficient methods that can directly resolve non-submodular influence maximization problems. In this regard, we fill the gap by proposing seed selection strategies using network graphical properties in a generalized threshold-based model, called influence barricade model, which is non-submodular. Specifically, under this model, we first establish theories to reveal graphical conditions that ensure the network generated by node removals has the same optimal seed set as that in the original network. We then exploit these theoretical conditions to develop efficient algorithms by strategically removing less-important nodes and selecting seeds only in the remaining network. To the best of our knowledge, this is the first graph-based approach that directly tackles non-submodular influence maximization.


State Estimation of Power Flows for Smart Grids via Belief Propagation

arXiv.org Artificial Intelligence

Belief propagation is an algorithm that is known from statistical physics and computer science. It provides an efficient way of calculating marginals that involve large sums of products which are efficiently rearranged into nested products of sums to approximate the marginals. It allows a reliable estimation of the state and its variance of power grids that is needed for the control and forecast of power grid management. At prototypical examples of IEEE-grids we show that belief propagation not only scales linearly with the grid size for the state estimation itself, but also facilitates and accelerates the retrieval of missing data and allows an optimized positioning of measurement units. Based on belief propagation, we give a criterion for how to assess whether other algorithms, using only local information, are adequate for state estimation for a given grid. We also demonstrate how belief propagation can be utilized for coarse-graining power grids towards representations that reduce the computational effort when the coarse-grained version is integrated into a larger grid. It provides a criterion for partitioning power grids into areas in order to minimize the error of flow estimates between different areas.


Instance Space Analysis for the Car Sequencing Problem

arXiv.org Artificial Intelligence

In this paper, we investigate an important research question in the car sequencing problem, that is, what characteristics make an instance hard to solve? To do so, we carry out an Instance Space Analysis for the car sequencing problem, by extracting a vector of problem features to characterize an instance and projecting feature vectors onto a two-dimensional space using principal component analysis. The resulting two dimensional visualizations provide insights into both the characteristics of the instances used for testing and to compare how these affect different optimisation algorithms. This guides us in constructing a new set of benchmark instances with a range of instance properties. These are shown to be both more diverse than the previous benchmarks and include many hard to solve instances. We systematically compare the performance of six algorithms for solving the car sequencing problem. The methods tested include three existing algorithms from the literature and three new ones. Importantly, we build machine learning models to identify the niche in the instance space that an algorithm is expected to perform well on. Our results show that the new algorithms are state-of-the-art. This analysis helps to understand problem hardness and select an appropriate algorithm for solving a given car sequencing problem instance.