Optimization
ORL: Reinforcement Learning Benchmarks for Online Stochastic Optimization Problems
Balaji, Bharathan, Bell-Masterson, Jordan, Bilgin, Enes, Damianou, Andreas, Garcia, Pablo Moreno, Jain, Arpit, Luo, Runfei, Maggiar, Alvaro, Narayanaswamy, Balakrishnan, Ye, Chun
Reinforcement Learning (RL) has achieved state-of-the-art results in domains such as robotics and games. We build on this previous work by applying RL algorithms to a selection of canonical online stochastic optimization problems with a range of practical applications: Bin Packing, Newsvendor, and Vehicle Routing. While there is a nascent literature that applies RL to these problems, there are no commonly accepted benchmarks which can be used to compare proposed approaches rigorously in terms of performance, scale, or generalizability. This paper aims to fill that gap. For each problem we apply both standard approaches as well as newer RL algorithms and analyze results. In each case, the performance of the trained RL policy is competitive with or superior to the corresponding baselines, while not requiring much in the way of domain knowledge. This highlights the potential of RL in real-world dynamic resource allocation problems.
Meta-Learning Millions of Hyper-parameters using the Implicit Function Theorem
Last night on the train I read this nice paper by David Duvenaud and colleagues. So I thought it's time for a David Duvenaud birthday special (don't get too excited David, I won't make it an annual tradition...) I recently covered iMAML: the meta-learning algorithm that makes use of implicit gradients to sidestep backpropagating through the inner loop optimization in meta-learning/hyperparameter tuning. The method presented in (Lorraine et al, 2019) uses the same high-level idea, but introduces a different - on the surface less fiddly - approximation to the crucial inverse Hessian. I won't spend a lot of time introducing the whole meta-learning setup from scratch, you can use the previous post as a starting point. Many - though not all - meta-learning or hyperparameter optimization problems can be stated as nested optimization problems.
Regularized and Smooth Double Core Tensor Factorization for Heterogeneous Data
Tarzanagh, Davoud Ataee, Michailidis, George
We introduce a general tensor model suitable for data analytic tasks for heterogeneous data sets, wherein there are joint low-rank structures within groups of observations, but also discriminative structures across different groups. To capture such complex structures, a double core tensor (DCOT) factorization model is introduced together with a family of smoothing loss functions. By leveraging the proposed smoothing function, the model accurately estimates the model factors, even in the presence of missing entries. A linearized ADMM method is employed to solve regularized versions of DCOT factorizations, that avoid large tensor operations and large memory storage requirements. Further, we establish theoretically its global convergence, together with consistency of the estimates of the model parameters. The effectiveness of the DCOT model is illustrated on several real-world examples including image completion, recommender systems, subspace clustering and detecting modules in heterogeneous Omics multi-modal data, since it provides more insightful decompositions than conventional tensor methods.
Reinforcement Learning from Imperfect Demonstrations under Soft Expert Guidance
Jing, Mingxuan, Ma, Xiaojian, Huang, Wenbing, Sun, Fuchun, Yang, Chao, Fang, Bin, Liu, Huaping
In this paper, we study Reinforcement Learning from Demonstrations (RLfD) that improves the exploration efficiency of Reinforcement Learning (RL) by providing expert demonstrations. Most of existing RLfD methods require demonstrations to be perfect and sufficient, which yet is unrealistic to meet in practice. To work on imperfect demonstrations, we first define an imperfect expert setting for RLfD in a formal way, and then point out that previous methods suffer from two issues in terms of optimality and convergence, respectively. Upon the theoretical findings we have derived, we tackle these two issues by regarding the expert guidance as a soft constraint on regulating the policy exploration of the agent, which eventually leads to a constrained optimization problem. We further demonstrate that such problem is able to be addressed efficiently by performing a local linear search on its dual form. Considerable empirical evaluations on a comprehensive collection of benchmarks indicate our method attains consistent improvement over other RLfD counterparts.
On an Optimal Solution to the Film Scheduling and Showtime Staggering Problem
Kohli, Ikjyot Singh, Inglis, Katherine Goff
In an era of data driven digital transformation, a customer driven business strategy is essential for success. In the motion picture industry, movie exhibitors must compete to win share of consumers entertainment time (and wallet) against digital entertainment alternatives offered by mammoth sized, digital focused, competitors like Netflix, Amazon and Disney [1]. Customer loyalty, point-of-sale and digital payment p latforms produce rich insights that can leveraged to inform business operations and automate the decision-making pr ocess, effectively enabling movie exhibitors to compete using analytics and artificial intelligence. This study presen ts a new, customer driven, quantitative approach to movie scheduling that can be utilized by movie exhibitors to increase attendance and market share. The role of the exhibitor is to show films that are produced by movie st udios (see [2] for more details on the roles of the stakeholders in the movie industry). Exhibitors do not have d ecision making authority over the movies that are produced by the studios.
Low-variance Black-box Gradient Estimates for the Plackett-Luce Distribution
Gadetsky, Artyom, Struminsky, Kirill, Robinson, Christopher, Quadrianto, Novi, Vetrov, Dmitry
Learning models with discrete latent variables using stochastic gradient descent remains a challenge due to the high variance of gradient estimates. Modern variance reduction techniques mostly consider categorical distributions and have limited applicability when the number of possible outcomes becomes large. In this work, we consider models with latent permutations and propose control variates for the Plackett-Luce distribution. In particular, the control variates allow us to optimize black-box functions over permutations using stochastic gradient descent. To illustrate the approach, we consider a variety of causal structure learning tasks for continuous and discrete data. We show that our method outperforms competitive relaxation-based optimization methods and is also applicable to non-differentiable score functions.
Smart Predict-and-Optimize for Hard Combinatorial Optimization Problems
Mandi, Jaynta, Demiroviฤ, Emir, Stuckey, Peter. J, Guns, Tias
Combinatorial optimization assumes that all parameters of the optimization problem, e.g. the weights in the objective function, are fixed. Often, these weights are mere estimates and increasingly machine learning techniques are used to for their estimation. Recently, Smart Predict and Optimize (SPO) has been proposed for problems with a linear objective function over the predictions, more specifically linear programming problems. It takes the regret of the predictions on the linear problem into account, by repeatedly solving it during learning. We investigate the use of SPO to solve more realistic discrete optimization problems. The main challenge is the repeated solving of the optimization problem. To this end, we investigate ways to relax the problem as well as warm-starting the learning and the solving. Our results show that even for discrete problems it often suffices to train by solving the relaxation in the SPO loss. Furthermore, this approach outperforms the state-of-the-art approach of Wilder, Dilkina, and Tambe. We experiment with weighted knapsack problems as well as complex scheduling problems, and show for the first time that a predict-and-optimize approach can successfully be used on large-scale combinatorial optimization problems. Introduction Combinatorial optimization aims to optimize an objective function over a set of feasible solutions defined on a discrete space. Numerous real-life decision-making problems can be formulated as combinatorial optimization problems (Korte et al. 2012; Trevisan 2011).
Responsible Scoring Mechanisms Through Function Sampling
Asudeh, Abolfazl, Jagadish, H. V.
Human decision-makers often receive assistance from data-driven algorithmic systems that provide a score for evaluating objects, including individuals. The scores are generated by a function (mechanism) that takes a set of features as input and generates a score.The scoring functions are either machine-learned or human-designed and can be used for different decision purposes such as ranking or classification. Given the potential impact of these scoring mechanisms on individuals' lives and on society, it is important to make sure these scores are computed responsibly. Hence we need tools for responsible scoring mechanism design. In this paper, focusing on linear scoring functions, we highlight the importance of unbiased function sampling and perturbation in the function space for devising such tools. We provide unbiased samplers for the entire function space, as well as a $\theta$-vicinity around a given function. We then illustrate the value of these samplers for designing effective algorithms in three diverse problem scenarios in the context of ranking. Finally, as a fundamental method for designing responsible scoring mechanisms, we propose a novel approach for approximating the construction of the arrangement of hyperplanes. Despite the exponential complexity of an arrangement in the number of dimensions, using function sampling, our algorithm is linear in the number of samples and hyperplanes, and independent of the number of dimensions.
Large-scale Multi-view Subspace Clustering in Linear Time
Kang, Zhao, Zhou, Wangtao, Zhao, Zhitong, Shao, Junming, Han, Meng, Xu, Zenglin
A plethora of multi-view subspace clustering (MVSC) methods have been proposed over the past few years. Researchers manage to boost clustering accuracy from different points of view. However, many state-of-the-art MVSC algorithms, typically have a quadratic or even cubic complexity, are inefficient and inherently difficult to apply at large scales. In the era of big data, the computational issue becomes critical. To fill this gap, we propose a large-scale MVSC (LMVSC) algorithm with linear order complexity. Inspired by the idea of anchor graph, we first learn a smaller graph for each view. Then, a novel approach is designed to integrate those graphs so that we can implement spectral clustering on a smaller graph. Interestingly, it turns out that our model also applies to single-view scenario.
Facility Location Problem with Capacity Constraints: Algorithmic and Mechanism Design Perspectives
Aziz, Haris, Chan, Hau, Lee, Barton E., Li, Bo, Walsh, Toby
We consider the facility location problem in the one-dimensional setting where each facility can serve a limited number of agents from the algorithmic and mechanism design perspectives. From the algorithmic perspective, we prove t hat the corresponding optimization problem, where the goal is t o locate facilities to minimize either the total cost to all ag ents or the maximum cost of any agent is NPhard. However, we show that the problem is fixed-parameter tractable, and the optimal solution can be computed in polynomial time whenever the number of facilities is bounded, or when all facilit ies have identical capacities. We then consider the problem fro m a mechanism design perspective where the agents are strategic and need not reveal their true locations. We show that sev - eral natural mechanisms studied in the uncapacitated setti ng either lose strategyproofness or a bound on the solution qua l-ity for the total or maximum cost objective. We then propose new mechanisms that are strategyproof and achieve approximation guarantees that almost match the lower bounds.