Optimization
Cautious Reinforcement Learning via Distributional Risk in the Dual Domain
Zhang, Junyu, Bedi, Amrit Singh, Wang, Mengdi, Koppel, Alec
We study the estimation of risk-sensitive policies in reinforcement learning problems defined by a Markov Decision Process (MDPs) whose state and action spaces are countably finite. Prior efforts are predominately afflicted by computational challenges associated with the fact that risk-sensitive MDPs are time-inconsistent. To ameliorate this issue, we propose a new definition of risk, which we call caution, as a penalty function added to the dual objective of the linear programming (LP) formulation of reinforcement learning. The caution measures the distributional risk of a policy, which is a function of the policy's long-term state occupancy distribution. To solve this problem in an online model-free manner, we propose a stochastic variant of primal-dual method that uses Kullback-Lieber (KL) divergence as its proximal term. We establish that the number of iterations/samples required to attain approximately optimal solutions of this scheme matches tight dependencies on the cardinality of the state and action spaces, but differs in its dependence on the infinity norm of the gradient of the risk measure. Experiments demonstrate the merits of this approach for improving the reliability of reward accumulation without additional computational burdens.
Near-linear Time Gaussian Process Optimization with Adaptive Batching and Resparsification
Calandriello, Daniele, Carratino, Luigi, Lazaric, Alessandro, Valko, Michal, Rosasco, Lorenzo
Gaussian processes (GP) are one of the most successful frameworks to model uncertainty. However, GP optimization (e.g., GP-UCB) suffers from major scalability issues. Experimental time grows linearly with the number of evaluations, unless candidates are selected in batches (e.g., using GP-BUCB) and evaluated in parallel. Furthermore, computational cost is often prohibitive since algorithms such as GP-BUCB require a time at least quadratic in the number of dimensions and iterations to select each batch. In this paper, we introduce BBKB (Batch Budgeted Kernel Bandits), the first no-regret GP optimization algorithm that provably runs in near-linear time and selects candidates in batches. This is obtained with a new guarantee for the tracking of the posterior variances that allows BBKB to choose increasingly larger batches, improving over GP-BUCB. Moreover, we show that the same bound can be used to adaptively delay costly updates to the sparse GP approximation used by BBKB, achieving a near-constant per-step amortized cost. These findings are then confirmed in several experiments, where BBKB is much faster than state-of-the-art methods.
State-only Imitation with Transition Dynamics Mismatch
Imitation Learning (IL) is a popular paradigm for training agents to achieve complicated goals by leveraging expert behavior, rather than dealing with the hardships of designing a correct reward function. With the environment modeled as a Markov Decision Process (MDP), most of the existing IL algorithms are contingent on the availability of expert demonstrations in the same MDP as the one in which a new imitator policy is to be learned. This is uncharacteristic of many real-life scenarios where discrepancies between the expert and the imitator MDPs are common, especially in the transition dynamics function. Furthermore, obtaining expert actions may be costly or infeasible, making the recent trend towards state-only IL (where expert demonstrations constitute only states or observations) ever so promising. Building on recent adversarial imitation approaches that are motivated by the idea of divergence minimization, we present a new state-only IL algorithm in this paper. It divides the overall optimization objective into two subproblems by introducing an indirection step and solves the subproblems iteratively. We show that our algorithm is particularly effective when there is a transition dynamics mismatch between the expert and imitator MDPs, while the baseline IL methods suffer from performance degradation. To analyze this, we construct several interesting MDPs by modifying the configuration parameters for the MuJoCo locomotion tasks from OpenAI Gym 1 .
Optimality and Stability in Non-Convex-Non-Concave Min-Max Optimization
Zhang, Guojun, Poupart, Pascal, Yu, Yaoliang
The existence of a saddle point in convex-concave games follows from the celebrated minimax theorem (von Neumann, 1928; Sion et al., 1958) and numerical algorithms for finding it has a long history in optimization (Dem'yanov and Malozemov, 1974; Nemirovsky and Yudin, 1983; Zhang et al., 2019; Lin et al., 2020). Recent success in generative adversarial networks (GANs) (Goodfellow et al., 2014; Heusel et al., 2017) and reinforcement learning (Sutton et al., 1998; Du et al., 2017; Dai et al., 2018) has imposed new challenges for non-convex-non-concave (NCNC) settings. An important question is: what is a local optimal point in min-max optimization? Daskalakis and Panageas (2018) used a local version of saddle points to define local optimality and studied the local convergence behavior of gradient descent (GD) (Arrow et al., 1958) and optimistic gradient descent (OGD) (Popov, 1980; Daskalakis et al., 2018). Following this work, Jin et al. (2019) proposed a new definition of local optimality called local min-max points, and studied how GD converges to such points. One of the difficulties in the NCNC settings is the absence of strong duality (e.g. Boyd and Vandenberghe (2004)), which implies an intrinsic order for min-max optimization, known as Stackelberg games (von Stackelberg, 1934) where a leader takes action first and a follower acts upon the leader's strategy. The global solution to Stackelberg games is well-defined, which we call a global min-max point (a.k.a.
Efficient algorithms for multivariate shape-constrained convex regression problems
Lin, Meixia, Sun, Defeng, Toh, Kim-Chuan
Shape-constrained convex regression problem deals with fitting a convex function to the observed data, where additional constraints are imposed, such as component-wise monotonicity and uniform Lipschitz continuity. This paper provides a comprehensive mechanism for computing the least squares estimator of a multivariate shape-constrained convex regression function in $\mathbb{R}^d$. We prove that the least squares estimator is computable via solving a constrained convex quadratic programming (QP) problem with $(n+1)d$ variables and at least $n(n-1)$ linear inequality constraints, where $n$ is the number of data points. For solving the generally very large-scale convex QP, we design two efficient algorithms, one is the symmetric Gauss-Seidel based alternating direction method of multipliers ({\tt sGS-ADMM}), and the other is the proximal augmented Lagrangian method ({\tt pALM}) with the subproblems solved by the semismooth Newton method ({\tt SSN}). Comprehensive numerical experiments, including those in the pricing of basket options and estimation of production functions in economics, demonstrate that both of our proposed algorithms outperform the state-of-the-art algorithm. The {\tt pALM} is more efficient than the {\tt sGS-ADMM} but the latter has the advantage of being simpler to implement.
Convergence to Second-Order Stationarity for Non-negative Matrix Factorization: Provably and Concurrently
Panageas, Ioannis, Skoulakis, Stratis, Varvitsiotis, Antonios, Wang, Xiao
Non-negative matrix factorization (NMF) is a fundamental non-convex optimization problem with numerous applications in Machine Learning (music analysis, document clustering, speech-source separation etc). Despite having received extensive study, it is poorly understood whether or not there exist natural algorithms that can provably converge to a local minimum. Part of the reason is because the objective is heavily symmetric and its gradient is not Lipschitz. In this paper we define a multiplicative weight update type dynamics (modification of the seminal Lee-Seung algorithm) that runs concurrently and provably avoids saddle points (first order stationary points that are not second order). Our techniques combine tools from dynamical systems such as stability and exploit the geometry of the NMF objective by reducing the standard NMF formulation over the non-negative orthant to a new formulation over (a scaled) simplex. An important advantage of our method is the use of concurrent updates, which permits implementations in parallel computing environments.
Complete Dictionary Learning via $\ell_p$-norm Maximization
Shen, Yifei, Xue, Ye, Zhang, Jun, Letaief, Khaled B., Lau, Vincent
Dictionary learning is a classic representation learning method that has been widely applied in signal processing and data analytics. In this paper, we investigate a family of $\ell_p$-norm ($p>2,p \in \mathbb{N}$) maximization approaches for the complete dictionary learning problem from theoretical and algorithmic aspects. Specifically, we prove that the global maximizers of these formulations are very close to the true dictionary with high probability, even when Gaussian noise is present. Based on the generalized power method (GPM), an efficient algorithm is then developed for the $\ell_p$-based formulations. We further show the efficacy of the developed algorithm: for the population GPM algorithm over the sphere constraint, it first quickly enters the neighborhood of a global maximizer, and then converges linearly in this region. Extensive experiments will demonstrate that the $\ell_p$-based approaches enjoy a higher computational efficiency and better robustness than conventional approaches and $p=3$ performs the best.
Sharp Asymptotics and Optimal Performance for Inference in Binary Models
Taheri, Hossein, Pedarsani, Ramtin, Thrampoulidis, Christos
We study convex empirical risk minimization for high-dimensional inference in binary models. Our first result sharply predicts the statistical performance of such estimators in the linear asymptotic regime under isotropic Gaussian features. Importantly, the predictions hold for a wide class of convex loss functions, which we exploit in order to prove a bound on the best achievable performance among them. Notably, we show that the proposed bound is tight for popular binary models (such as Signed, Logistic or Probit), by constructing appropriate loss functions that achieve it. More interestingly, for binary linear classification under the Logistic and Probit models, we prove that the performance of least-squares is no worse than 0.997 and 0.98 times the optimal one. Numerical simulations corroborate our theoretical findings and suggest they are accurate even for relatively small problem dimensions.
CAAI -- A Cognitive Architecture to Introduce Artificial Intelligence in Cyber-Physical Production Systems
Fischbach, Andreas, Strohschein, Jan, Bunte, Andreas, Stork, Jörg, Faeskorn-Woyke, Heide, Moriz, Natalia, Bartz-Beielstein, Thomas
This paper introduces CAAI, a novel cognitive architecture for artificial intelligence in cyber-physical production systems. The goal of the architecture is to reduce the implementation effort for the usage of artificial intelligence algorithms. The core of the CAAI is a cognitive module that processes declarative goals of the user, selects suitable models and algorithms, and creates a configuration for the execution of a processing pipeline on a big data platform. Constant observation and evaluation against performance criteria assess the performance of pipelines for many and varying use cases. Based on these evaluations, the pipelines are automatically adapted if necessary. The modular design with well-defined interfaces enables the reusability and extensibility of pipeline components. A big data platform implements this modular design supported by technologies such as Docker, Kubernetes, and Kafka for virtualization and orchestration of the individual components and their communication. The implementation of the architecture is evaluated using a real-world use case.
Efficient Rollout Strategies for Bayesian Optimization
Lee, Eric Hans, Eriksson, David, Cheng, Bolong, McCourt, Michael, Bindel, David
Bayesian optimization (BO) is a class of sample-efficient global optimization methods, where a probabilistic model conditioned on previous observations is used to determine future evaluations via the optimization of an acquisition function. Most acquisition functions are myopic, meaning that they only consider the impact of the next function evaluation. Non-myopic acquisition functions consider the impact of the next $h$ function evaluations and are typically computed through rollout, in which $h$ steps of BO are simulated. These rollout acquisition functions are defined as $h$-dimensional integrals, and are expensive to compute and optimize. We show that a combination of quasi-Monte Carlo, common random numbers, and control variates significantly reduce the computational burden of rollout. We then formulate a policy-search based approach that removes the need to optimize the rollout acquisition function. Finally, we discuss the qualitative behavior of rollout policies in the setting of multi-modal objectives and model error.