Optimization
Variance-Constrained Actor-Critic Algorithms for Discounted and Average Reward MDPs
A., Prashanth L., Ghavamzadeh, Mohammad
In many sequential decision-making problems we may want to manage risk by minimizing some measure of variability in rewards in addition to maximizing a standard criterion. Variance related risk measures are among the most common risk-sensitive criteria in finance and operations research. However, optimizing many such criteria is known to be a hard problem. In this paper, we consider both discounted and average reward Markov decision processes. For each formulation, we first define a measure of variability for a policy, which in turn gives us a set of risk-sensitive criteria to optimize. For each of these criteria, we derive a formula for computing its gradient. We then devise actor-critic algorithms that operate on three timescales - a TD critic on the fastest timescale, a policy gradient (actor) on the intermediate timescale, and a dual ascent for Lagrange multipliers on the slowest timescale. In the discounted setting, we point out the difficulty in estimating the gradient of the variance of the return and incorporate simultaneous perturbation approaches to alleviate this. The average setting, on the other hand, allows for an actor update using compatible features to estimate the gradient of the variance. We establish the convergence of our algorithms to locally risk-sensitive optimal policies. Finally, we demonstrate the usefulness of our algorithms in a traffic signal control application.
Statistical Limits of Convex Relaxations
Wang, Zhaoran, Gu, Quanquan, Liu, Han
Many high dimensional sparse learning problems are formulated as nonconvex optimization. A popular approach to solve these nonconvex optimization problems is through convex relaxations such as linear and semidefinite programming. In this paper, we study the statistical limits of convex relaxations. Particularly, we consider two problems: Mean estimation for sparse principal submatrix and edge probability estimation for stochastic block model. We exploit the sum-of-squares relaxation hierarchy to sharply characterize the limits of a broad class of convex relaxations. Our result shows statistical optimality needs to be compromised for achieving computational tractability using convex relaxations. Compared with existing results on computational lower bounds for statistical problems, which consider general polynomial-time algorithms and rely on computational hardness hypotheses on problems like planted clique detection, our theory focuses on a broad class of convex relaxations and does not rely on unproven hypotheses.
Approximating Sparse PCA from Incomplete Data
Kundu, Abhisek, Drineas, Petros, Magdon-Ismail, Malik
We study how well one can recover sparse principal components of a data matrix using a sketch formed from a few of its elements. We show that for a wide class of optimization problems, if the sketch is close (in the spectral norm) to the original data matrix, then one can recover a near optimal solution to the optimization problem by using the sketch. In particular, we use this approach to obtain sparse principal components and show that for \math{m} data points in \math{n} dimensions, \math{O(\epsilon^{-2}\tilde k\max\{m,n\})} elements gives an \math{\epsilon}-additive approximation to the sparse PCA problem (\math{\tilde k} is the stable rank of the data matrix). We demonstrate our algorithms extensively on image, text, biological and financial data. The results show that not only are we able to recover the sparse PCAs from the incomplete data, but by using our sparse sketch, the running time drops by a factor of five or more.
Multi-dimensional sparse structured signal approximation using split Bregman iterations
Isaac, Yoann, Barthรฉlemy, Quentin, Atif, Jamal, Gouy-Pailler, Cรฉdric, Sebag, Michรจle
Sparse dictionary-based representations, where each signal involves few atoms, have been thoroughly investigated for their good properties, as they enable robust transmission (compressed sensing [1]) or image in-painting [2]. The dictionary is either given, based on the domain knowledge, or learned from the signals [3]. The so-called sparse approximation algorithm aims at finding a sparse approximate representation of the considered signals using this dictionary, by minimizing a weighted sum of the approximation loss and the representation sparsity (see [4] for a survey). When available, prior knowledge about the application domain can also be used to guide the search toward "plausible" decompositions. This paper focuses on sparse approximation enforcing a structured decomposition property, defined as follows. Let the signals be structured (e.g.
A Neurodynamical System for finding a Minimal VC Dimension Classifier
Jayadeva, null, Soman, Sumit, Bhaya, Amit
The recently proposed Minimal Complexity Machine (MCM) finds a hyperplane classifier by minimizing an exact bound on the Vapnik-Chervonenkis (VC) dimension. The VC dimension measures the capacity of a learning machine, and a smaller VC dimension leads to improved generalization. On many benchmark datasets, the MCM generalizes better than SVMs and uses far fewer support vectors than the number used by SVMs. In this paper, we describe a neural network based on a linear dynamical system, that converges to the MCM solution. The proposed MCM dynamical system is conducive to an analogue circuit implementation on a chip or simulation using Ordinary Differential Equation (ODE) solvers. Numerical experiments on benchmark datasets from the UCI repository show that the proposed approach is scalable and accurate, as we obtain improved accuracies and fewer number of support vectors (upto 74.3% reduction) with the MCM dynamical system.Keywords.
On Machine Learning towards Predictive Sales Pipeline Analytics
Yan, Junchi (East China Normal Univesity) | Zhang, Chao (Shanghai Jiao Tong University) | Zha, Hongyuan (IBM Research - China) | Gong, Min (East China Normal University) | Sun, Changhua (IBM Research - China) | Huang, Jin (IBM Research - China) | Chu, Stephen (IBM Research - China) | Yang, Xiaokang (IBM Research - China)
Sales pipeline win-propensity prediction is fundamental to effective sales management. In contrast to using subjective human rating, we propose a modern machine learning paradigm to estimate the win-propensity of sales leads over time. A profile-specific two-dimensional Hawkes processes model is developed to capture the influence from seller's activities on their leads to the win outcome, coupled with lead's personalized profiles. It is motivated by two observations: i) sellers tend to frequently focus their selling activities and efforts on a few leads during a relatively short time. This is evidenced and reflected by their concentrated interactions with the pipeline, including login, browsing and updating the sales leads which are logged by the system; ii) the pending opportunity is prone to reach its win outcome shortly after such temporally concentrated interactions. Our model is deployed and in continual use to a large, global, B2B multinational technology enterprize (Fortune 500) with a case study. Due to the generality and flexibility of the model, it also enjoys the potential applicability to other real-world problems.
Transaction Costs-Aware Portfolio Optimization via Fast Lowner-John Ellipsoid Approximation
Shen, Weiwei (GE Global Research Center) | Wang, Jun (Alibaba Group)
However, implementing such a strategy requires combining the VFI framework with policy parameterization, rebalancing continually as assets prices fluctuate, the proposed ADP method enjoys complementary advantages and therefore will lead to high or even infinite transaction of low approximation errors from VFI and high computational costs. Since then researchers have tried to address this issue efficiency from policy parameterization. Briefly, by solving Merton's portfolio problem in the presence the components from VFI pave the way for effectively parameterizing of transaction costs. Thereinto, the proportional transaction a complex policy in a high-dimensional space; costs model, as a suitable model for brokerage commissions the components from policy parameterization provide a and bid-ask spread costs, typifies the common situation pathway to efficiently evaluating the strategy and bypassing for normal investors (Brandt 2010; Cvitanic 2001; the issue of error amplification.
Approximate MaxEnt Inverse Optimal Control and Its Application for Mental Simulation of Human Interactions
Huang, De-An (Carnegie Mellon University) | Farahmand, Amir-massoud (Carnegie Mellon University) | Kitani, Kris M. (Carnegie Mellon University) | Bagnell, James Andrew (Carnegie Mellon University)
Maximum entropy inverse optimal control (MaxEnt IOC) is an effective means of discovering the underlying cost function of demonstrated human activity and can be used to predict human behavior over low-dimensional state spaces (i.e., forecasting of 2D trajectories). To enable inference in very large state spaces, we introduce an approximate MaxEnt IOC procedure to address the fundamental computational bottleneck stemming from calculating the partition function via dynamic programming. Approximate MaxEnt IOC is based on two components: approximate dynamic programming and Monte Carlo sampling. We analyze this approximation approach and provide a finite-sample error upper bound on its excess loss. We validate the proposed method in the context of analyzing dual-agent interactions from video, where we use approximate MaxEnt IOC to simulate mental images of a single agents body pose sequence (a high-dimensional image space). We experiment with sequences image data taken from RGB and RGBD data and show that it is possible to learn cost functions that lead to accurate predictions in high-dimensional problems that were previously intractable.
Multi-Objective Reinforcement Learning with Continuous Pareto Frontier Approximation
Pirotta, Matteo (Politecnico di Milano) | Parisi, Simone (Politecnico di Milano) | Restelli, Marcello (Politecnico di Milano)
This paper is about learning a continuous approximation of the Pareto frontier in Multi-Objective Markov Decision Problems (MOMDPs).We propose a policy-based approach that exploits gradient information to generate solutions close to the Pareto ones.Differently from previous policy-gradient multi-objective algorithms, where n optimization routines are used to have n solutions, our approach performs a single gradient-ascent run that at each step generates an improved continuous approximation of the Pareto frontier.The idea is to exploit a gradient-based approach to optimize the parameters of a function that defines a manifold in the policy parameter space so that the corresponding image in the objective space gets as close as possible to the Pareto frontier.Besides deriving how to compute and estimate such gradient, we will also discuss the non-trivial issue of defining a metric to assess the quality of the candidate Pareto frontiers.Finally, the properties of the proposed approach are empirically evaluated on two interesting MOMDPs.
Emerging Architectures for Global System Science
Milano, Michela (Universita') | Hentenryck, Pascal Van (di Bologna)
Our society is organized around a number of (interdependent) global systems. Logistic and supply chains, health services, energy networks, financial markets, computer networks, and cities are just a few examples of such global, complex systems. These global systems are socio-technical and involve interactions between complex infrastructures, man-made processes, natural phenomena, multiple stakeholders, and human behavior. For the first time in the history of manking, we have access to data sets of unprecedented scale and accuracy about these infrastructures, processes, natural phenomena, and human behaviors. In addition, progress in high-performancing computing, data mining, machine learning, and decision support opens the possibility of looking at these problems more holistically, capturing many of these aspects simultaneously. This paper addresses emergent architectures enabling controlling, predicting and reaoning on these systems.