Goto

Collaborating Authors

 Optimization


D-SPIDER-SFO: A Decentralized Optimization Algorithm with Faster Convergence Rate for Nonconvex Problems

arXiv.org Machine Learning

Decentralized optimization algorithms have attracted intensive interests recently, as it has a balanced communication pattern, especially when solving large-scale machine learning problems. Stochastic Path Integrated Differential Estimator Stochastic First-Order method (SPIDER-SFO) nearly achieves the algorithmic lower bound in certain regimes for nonconvex problems. However, whether we can find a decentralized algorithm which achieves a similar convergence rate to SPIDER-SFO is still unclear. To tackle this problem, we propose a decentralized variant of SPIDER-SFO, called decentralized SPIDER-SFO (D-SPIDER-SFO). We show that D-SPIDER-SFO achieves a similar gradient computation cost-- that is, O ( null 3) for finding an null -approximate first-order stationary point--to its centralized counterpart. To the best of our knowledge, D-SPIDER-SFO achieves the state-of-the-art performance for solving nonconvex optimization problems on decentralized networks in terms of the computational cost. Experiments on different network configurations demonstrate the efficiency of the proposed method. Introduction Distributed optimization is a popular technique for solving large scale machine learning problems Li et al. (2014), ranging from visual object recognition Huang et al. (2017); He et al. (2016) to natural language processing V aswani et al. (2017); Devlin et al. (2019). For distributed optimization, a set of workers form a connected computational network, and each worker is assigned a portion of the computing task. The centralized network topology, like parameter server Jianmin et al. (2016); Dean et al. (2012); Li et al. (2014); Zinkevich et al. (2010), consists of a central worker connected with all other workers.


Deep Model-Based Reinforcement Learning via Estimated Uncertainty and Conservative Policy Optimization

arXiv.org Machine Learning

Model-based reinforcement learning algorithms tend to achieve higher sample efficiency than model-free methods. However, due to the inevitable errors of learned models, model-based methods struggle to achieve the same asymptotic performance as model-free methods. In this paper, We propose a Policy Optimization method with Model-Based Uncertainty (POMBU)---a novel model-based approach---that can effectively improve the asymptotic performance using the uncertainty in Q-values. We derive an upper bound of the uncertainty, based on which we can approximate the uncertainty accurately and efficiently for model-based methods. We further propose an uncertainty-aware policy optimization algorithm that optimizes the policy conservatively to encourage performance improvement with high probability. This can significantly alleviate the overfitting of policy to inaccurate models. Experiments show POMBU can outperform existing state-of-the-art policy optimization algorithms in terms of sample efficiency and asymptotic performance. Moreover, the experiments demonstrate the excellent robustness of POMBU compared to previous model-based approaches.


Actionable Interpretability through Optimizable Counterfactual Explanations for Tree Ensembles

arXiv.org Artificial Intelligence

Counterfactual explanations help users understand why machine learned models make certain decisions, and more specifically, how these decisions can be changed. In this work, we frame the problem of finding counterfactual explanations -- the minimal perturbation to an input such that the prediction changes -- as an optimization task. Previously, optimization techniques for generating counterfactual examples could only be applied to differentiable models, or alternatively via query access to the model by estimating gradients from randomly sampled perturbations. In order to accommodate non-differentiable models such as tree ensembles, we propose using probabilistic model approximations in the optimization framework. We introduce a novel approximation technique that is effective for finding counterfactual explanations while also closely approximating the original model. Our results show that our method is able to produce counterfactual examples that are closer to the original instance in terms of Euclidean, Cosine, and Manhattan distance compared to other methods specifically designed for tree ensembles.


A Fully Natural Gradient Scheme for Improving Inference of the Heterogeneous Multi-Output Gaussian Process Model

arXiv.org Artificial Intelligence

A recent novel extension of multi-output Gaussian processes handles heterogeneous outputs assuming that each output has its own likelihood function. It uses a vector-valued Gaussian process prior to jointly model all likelihoods' parameters as latent functions drawn from a Gaussian process with a linear model of coregionalisation covariance. By means of an inducing points framework, the model is able to obtain tractable variational bounds amenable to stochastic variational inference. Nonetheless, the strong conditioning between the variational parameters and the hyper-parameters burdens the adaptive gradient optimisation methods used in the original approach. To overcome this issue we borrow ideas from variational optimisation introducing an exploratory distribution over the hyper-parameters, allowing inference together with the variational parameters through a fully natural gradient optimisation scheme. We show that our optimisation scheme can achieve better local optima solution with higher test performance rates than adaptive gradient methods or an hybrid strategy that partially use natural gradients in cooperation with the Adam method. We compare the performance of the different methods over toy and real databases.


Optimization technology and AI in the real world: Operational

#artificialintelligence

Consider this statistic from Enterprise Strategy Group (ESG): "60 percent of respondents indicate that improving operational efficiency is one of the most important objectives their organization expects to achieve from their AI/ML (machine learning) investments." This statistic was a result of looking at the top line data from ESG research that combines responses of enterprise and midmarket organizations. Currently, many companies still have AI and ML firmly in the realm of IT, but the ESG research reveals that AI and ML are becoming more strategic to businesses. Key performance indicators (KPIs) owned by lines of business are cited as one of the top objectives for measuring the effectiveness of AI and ML. It is in this context that we would like to shine the spotlight on the role of optimization technology in supplementing ML to help businesses drive optimal business outcomes through better decisions.


Study of Distributed Robust Beamforming with Low-Rank and Cross-Correlation Techniques

arXiv.org Machine Learning

In this work, we present a novel robust distributed beamforming (RDB) approach based on low-rank and cross-correlation techniques. The proposed RDB approach mitigates the effects of channel errors in wireless networks equipped with relays based on the exploitation of the cross-correlation between the received data from the relays at the destination and the system output and low-rank techniques. The relay nodes are equipped with an amplify-and-forward (AF) protocol and the channel errors are modeled using an additive matrix perturbation, which results in degradation of the system performance. The proposed method, denoted low-rank and cross-correlation RDB (LRCC-RDB), considers a total relay transmit power constraint in the system and the goal of maximizing the output signal-to-interference-plus-noise ratio (SINR). We carry out a performance analysis of the proposed LRCC-RDB technique along with a computational complexity study. The proposed LRCC-RDB does not require any costly online optimization procedure and simulations show an excellent performance as compared to previously reported algorithms.


An Alternative Cross Entropy Loss for Learning-to-Rank

arXiv.org Machine Learning

Listwise learning-to-rank methods form a powerful class of ranking algorithms that are widely adopted in applications such as information retrieval. These algorithms learn to rank a set of items by optimizing a loss that is a function of the entire set---as a surrogate to a typically non-differentiable ranking metric. Despite their empirical success, existing listwise methods are based on heuristics and remain theoretically ill-understood. In particular, none of the empirically-successful loss functions are related to ranking metrics. In this work, we propose a cross entropy-based learning-to-rank loss function that is theoretically sound and is a convex bound on NDCG, a popular ranking metric. Furthermore, empirical evaluation of an implementation of the proposed method with gradient boosting machines on benchmark learning-to-rank datasets demonstrates the superiority of our proposed formulation over existing algorithms in quality and robustness.


OASIS: ILP-Guided Synthesis of Loop Invariants

arXiv.org Machine Learning

Finding appropriate inductive loop invariants for a program is a key challenge in verifying its functional properties. Although the problem is undecidable in general, several heuristics have been proposed to handle practical programs that tend to have simple control-flow structures. However, these heuristics only work well when the space of invariants is small. On the other hand, machine-learned techniques that use continuous optimization have a high sample complexity, i.e., the number of invariant guesses and the associated counterexamples, since the invariant is required to exactly satisfy a specification. We propose a novel technique that is able to solve complex verification problems involving programs with larger number of variables and non-linear specifications. We formulate an invariant as a piecewise low-degree polynomial, and reduce the problem of synthesizing it to a set of integer linear programming (ILP) problems. This enables the use of state-of-the-art ILP techniques that combine enumerative search with continuous optimization; thus ensuring fast convergence for a large class of verification tasks while still ensuring low sample complexity. We instantiate our technique as the open-source oasis tool using an off-the-shelf ILP solver, and evaluate it on more than 300 benchmark tasks collected from the annual SyGuS competition and recent prior work. Our experiments show that oasis outperforms the state-of-the-art tools, including the winner of last year's SyGuS competition, and is able to solve 9 challenging tasks that existing tools fail on.


Prediction of Horizontal Data Partitioning Through Query Execution Cost Estimation

arXiv.org Machine Learning

The excessively increased volume of data in modern data management systems demands an improved system performance, frequently provided by data distribution, system scalability and performance optimization techniques. Optimized horizontal data partitioning has a significant influence of distributed data management systems. An optimally partitioned schema found in the early phase of logical database design without loading of real data in the system and its adaptation to changes of business environment are very important for a successful implementation, system scalability and performance improvement. In this paper we present a novel approach for finding an optimal horizontally partitioned schema that manifests a minimal total execution cost of a given database workload. Our approach is based on a formal model that enables abstraction of the predicates in the workload queries, and are subsequently used to define all relational fragments. This approach has predictive features acquired by simulation of horizontal partitioning, without loading any data into the partitions, but instead, altering the statistics in the database catalogs. We define an optimization problem and employ a genetic algorithm (GA) to find an approximately optimal horizontally partitioned schema. The solutions to the optimization problem are evaluated using PostgreSQL's query optimizer. The initial experimental evaluation of our approach confirms its efficiency and correctness, and the numbers imply that the approach is effective in reducing the workload execution cost.


Learning to Optimize Variational Quantum Circuits to Solve Combinatorial Problems

arXiv.org Machine Learning

Quantum computing is a computational paradigm with the potential to outperform classical methods for a variety of problems. Proposed recently, the Quantum Approximate Optimization Algorithm (QAOA) is considered as one of the leading candidates for demonstrating quantum advantage in the near term. QAOA is a variational hybrid quantum-classical algorithm for approximately solving combinatorial optimization problems. The quality of the solution obtained by QAOA for a given problem instance depends on the performance of the classical optimizer used to optimize the variational parameters. In this paper, we formulate the problem of finding optimal QAOA parameters as a learning task in which the knowledge gained from solving training instances can be leveraged to find high-quality solutions for unseen test instances. To this end, we develop two machine-learning-based approaches. Our first approach adopts a reinforcement learning (RL) framework to learn a policy network to optimize QAOA circuits. Our second approach adopts a kernel density estimation (KDE) technique to learn a generative model of optimal QAOA parameters. In both approaches, the training procedure is performed on small-sized problem instances that can be simulated on a classical computer; yet the learned RL policy and the generative model can be used to efficiently solve larger problems. Extensive simulations using the IBM Qiskit Aer quantum circuit simulator demonstrate that our proposed RL- and KDE-based approaches reduce the optimality gap by factors up to 30.15 when compared with other commonly used off-the-shelf optimizers.