Optimization
Fast quantum state reconstruction via accelerated non-convex programming
Kim, Junhyung Lyle, Kollias, George, Kalev, Amir, Wei, Ken X., Kyrillidis, Anastasios
We propose a new quantum state reconstruction method that combines ideas from compressed sensing, non-convex optimization, and acceleration methods. The algorithm, called Momentum-Inspired Factored Gradient Descent (\texttt{MiFGD}), extends the applicability of quantum tomography for larger systems. Despite being a non-convex method, \texttt{MiFGD} converges \emph{provably} to the true density matrix at a linear rate, in the absence of experimental and statistical noise, and under common assumptions. With this manuscript, we present the method, prove its convergence property and provide Frobenius norm bound guarantees with respect to the true density matrix. From a practical point of view, we benchmark the algorithm performance with respect to other existing methods, in both synthetic and real experiments performed on an IBM's quantum processing unit. We find that the proposed algorithm performs orders of magnitude faster than state of the art approaches, with the same or better accuracy. In both synthetic and real experiments, we observed accurate and robust reconstruction, despite experimental and statistical noise in the tomographic data. Finally, we provide a ready-to-use code for state tomography of multi-qubit systems.
Domain-specific Genetic Algorithm for Multi-tenant DNNAccelerator Scheduling
Kao, Sheng-Chun, Krishna, Tushar
As Deep Learning continues to drive a variety of applications in datacenters and HPC, there is a growing trend towards building large accelerators with several sub-accelerator cores/chiplets. This work looks at the problem of supporting multi-tenancy on such accelerators. In particular, we focus on the problem of mapping layers from several DNNs simultaneously on an accelerator. Given the extremely large search space, we formulate the search as an optimization problem and develop a specialized genetic algorithm called G# withcustom operators to enable structured sample-efficient exploration. We quantitatively compare G# with several common heuristics, state-of-the-art optimization methods, and reinforcement learning methods across different accelerator set-tings (large/small accelerators) and different sub-accelerator configurations (homogeneous/heterogeneous), and observeG# can consistently find better solutions. Further, to enable real-time scheduling, we also demonstrate a method to generalize the learnt schedules and transfer them to the next batch of jobs, reducing schedule compute time to near zero.
Inspect, Understand, Overcome: A Survey of Practical Methods for AI Safety
Houben, Sebastian, Abrecht, Stephanie, Akila, Maram, Bär, Andreas, Brockherde, Felix, Feifel, Patrick, Fingscheidt, Tim, Gannamaneni, Sujan Sai, Ghobadi, Seyed Eghbal, Hammam, Ahmed, Haselhoff, Anselm, Hauser, Felix, Heinzemann, Christian, Hoffmann, Marco, Kapoor, Nikhil, Kappel, Falk, Klingner, Marvin, Kronenberger, Jan, Küppers, Fabian, Löhdefink, Jonas, Mlynarski, Michael, Mock, Michael, Mualla, Firas, Pavlitskaya, Svetlana, Poretschkin, Maximilian, Pohl, Alexander, Ravi-Kumar, Varun, Rosenzweig, Julia, Rottmann, Matthias, Rüping, Stefan, Sämann, Timo, Schneider, Jan David, Schulz, Elena, Schwalbe, Gesina, Sicking, Joachim, Srivastava, Toshika, Varghese, Serin, Weber, Michael, Wirkert, Sebastian, Wirtz, Tim, Woehrle, Matthias
The use of deep neural networks (DNNs) in safety-critical applications like mobile health and autonomous driving is challenging due to numerous model-inherent shortcomings. These shortcomings are diverse and range from a lack of generalization over insufficient interpretability to problems with malicious inputs. Cyber-physical systems employing DNNs are therefore likely to suffer from safety concerns. In recent years, a zoo of state-of-the-art techniques aiming to address these safety concerns has emerged. This work provides a structured and broad overview of them. We first identify categories of insufficiencies to then describe research activities aiming at their detection, quantification, or mitigation. Our paper addresses both machine learning experts and safety engineers: The former ones might profit from the broad range of machine learning topics covered and discussions on limitations of recent methods. The latter ones might gain insights into the specifics of modern ML methods. We moreover hope that our contribution fuels discussions on desiderata for ML systems and strategies on how to propel existing approaches accordingly.
Stochastic Mirror Descent for Low-Rank Tensor Decomposition Under Non-Euclidean Losses
Pu, Wenqiang, Ibrahim, Shahana, Fu, Xiao, Hong, Mingyi
This work considers low-rank canonical polyadic decomposition (CPD) under a class of non-Euclidean loss functions that frequently arise in statistical machine learning and signal processing. These loss functions are often used for certain types of tensor data, e.g., count and binary tensors, where the least squares loss is considered unnatural.Compared to the least squares loss, the non-Euclidean losses are generally more challenging to handle. Non-Euclidean CPD has attracted considerable interests and a number of prior works exist. However, pressing computational and theoretical challenges, such as scalability and convergence issues, still remain. This work offers a unified stochastic algorithmic framework for large-scale CPD decomposition under a variety of non-Euclidean loss functions. Our key contribution lies in a tensor fiber sampling strategy-based flexible stochastic mirror descent framework. Leveraging the sampling scheme and the multilinear algebraic structure of low-rank tensors, the proposed lightweight algorithm ensures global convergence to a stationary point under reasonable conditions. Numerical results show that our framework attains promising non-Euclidean CPD performance. The proposed framework also exhibits substantial computational savings compared to state-of-the-art methods.
Uncertainty Principles in Risk-Aware Statistical Estimation
Koumpis, Nikolas P., Kalogerias, Dionysios S.
We present a new uncertainty principle for risk-aware statistical estimation, effectively quantifying the inherent trade-off between mean squared error ($\mse$) and risk, the latter measured by the associated average predictive squared error variance ($\sev$), for every admissible estimator of choice. Our uncertainty principle has a familiar form and resembles fundamental and classical results arising in several other areas, such as the Heisenberg principle in statistical and quantum mechanics, and the Gabor limit (time-scale trade-offs) in harmonic analysis. In particular, we prove that, provided a joint generative model of states and observables, the product between $\mse$ and $\sev$ is bounded from below by a computable model-dependent constant, which is explicitly related to the Pareto frontier of a recently studied $\sev$-constrained minimum $\mse$ (MMSE) estimation problem. Further, we show that the aforementioned constant is inherently connected to an intuitive new and rigorously topologically grounded statistical measure of distribution skewness in multiple dimensions, consistent with Pearson's moment coefficient of skewness for variables on the line. Our results are also illustrated via numerical simulations.
Emergent Unfairness in Algorithmic Fairness-Accuracy Trade-Off Research
Cooper, A. Feder, Abrams, Ellen
Across machine learning (ML) sub-disciplines, researchers make explicit mathematical assumptions in order to facilitate proof-writing. We note that, specifically in the area of fairness-accuracy trade-off optimization scholarship, similar attention is not paid to the normative assumptions that ground this approach. Such assumptions presume that 1) accuracy and fairness are in inherent opposition to one another, 2) strict notions of mathematical equality can adequately model fairness, 3) it is possible to measure the accuracy and fairness of decisions independent from historical context, and 4) collecting more data on marginalized individuals is a reasonable solution to mitigate the effects of the trade-off. We argue that such assumptions, which are often left implicit and unexamined, lead to inconsistent conclusions: While the intended goal of this work may be to improve the fairness of machine learning models, these unexamined, implicit assumptions can in fact result in emergent unfairness. We conclude by suggesting a concrete path forward toward a potential resolution.
Fast Distributionally Robust Learning with Variance Reduced Min-Max Optimization
Yu, Yaodong, Lin, Tianyi, Mazumdar, Eric, Jordan, Michael I.
With machine learning systems increasingly being deployed in real-world settings, there is an urgent need for machine learning approaches that can adapt to--or are robust to--changes in the environment. Despite this, the dominant paradigm for supervised learning [Hastie et al., 2009] remains that of empirical risk minimization (ERM) [Vapnik, 2013], wherein a model is trained by minimizing a loss over a fixed set of training data. A key assumption underlying this approach is that the training data is from the same distribution as the test data--i.e., the distribution of the data does not change between training time and deployment. Such assumptions are well known to rarely hold in practice. Indeed, the distribution may change due to sample selection bias, nonstationarity in the environment [Quionero-Candela et al., 2009], or even adversarial perturbations [Szegedy et al., 2013, Madry et al., 2018], leaving machine learning models trained through ERM particularly susceptible to adversarial attacks [Szegedy et al., 2013, Carlini and Wagner, 2017] or to degraded performance from distribution shifts. Distributionally robust supervised learning (DRSL) seeks to address this issue by explicitly optimizing for solutions that are are robust to adversarial distribution shifts.
Dynamic Cat Swarm Optimization Algorithm for Backboard Wiring Problem
Ahmed, Aram, Rashid, Tarik A., Saeed, Soran
This paper presents a powerful swarm intelligence meta-heuristic optimization algorithm called Dynamic Cat Swarm Optimization. The formulation is through modifying the existing Cat Swarm Optimization. The original Cat Swarm Optimization suffers from the shortcoming of "premature convergence", which is the possibility of entrapment in local optima which usually happens due to the off-balance between exploration and exploitation phases. Therefore, the proposed algorithm suggests a new method to provide a proper balance between these phases by modifying the selection scheme and the seeking mode of the algorithm. To evaluate the performance of the proposed algorithm, 23 classical test functions, 10 modern test functions (CEC 2019) and a real world scenario are used. In addition, the Dimension-wise diversity metric is used to measure the percentage of the exploration and exploitation phases. The optimization results show the effectiveness of the proposed algorithm, which ranks first compared to several well-known algorithms available in the literature. Furthermore, statistical methods and graphs are also used to further confirm the outperformance of the algorithm. Finally, the conclusion as well as future directions to further improve the algorithm are discussed.
Low-rank Tensor Estimation via Riemannian Gauss-Newton: Statistical Optimality and Second-Order Convergence
In this paper, we consider the estimation of a low Tucker rank tensor from a number of noisy linear measurements. The general problem covers many specific examples arising from applications, including tensor regression, tensor completion, and tensor PCA/SVD. We propose a Riemannian Gauss-Newton (RGN) method with fast implementations for low Tucker rank tensor estimation. Different from the generic (super)linear convergence guarantee of RGN in the literature, we prove the first quadratic convergence guarantee of RGN for low-rank tensor estimation under some mild conditions. A deterministic estimation error lower bound, which matches the upper bound, is provided that demonstrates the statistical optimality of RGN. The merit of RGN is illustrated through two machine learning applications: tensor regression and tensor SVD. Finally, we provide the simulation results to corroborate our theoretical findings.
One-parameter family of acquisition functions for efficient global optimization
In diverse fields of science and engineering, one frequently faces the need to know the optimum of a black-box function that is expensive to evaluate. In materials science, in order to determine an optimal composition of alloys one has to repeat manual experiments that cost time and money. In machine learning model building, one has to tune a number of hyperparameters of a model but testing the performance of a model on big data via cross validation takes hours or even days. Thus, a framework is needed that provides a systematic means to minimize the number of queries needed to reach the optimal solution. Bayesian optimization (BO) [1-3] is a powerful methodology to seek an optimum of a black-box function without knowledge of its analytical properties, such as its gradient.