Optimization
Crowd-aware itinerary recommendation: a game-theoretic approach to optimize social welfare
Liu, Junhua, Guo, Chu, Wood, Kristin L., Lim, Kwan Hui
The demand for Itinerary Planning grows rapidly in recent years as the economy and standard of living are improving globally. Nonetheless, itinerary recommendation remains a complex and difficult task, especially for one that is queuing time- and crowd-aware. This difficulty is due to the large amount of parameters involved, i.e., attraction popularity, queuing time, walking time, operating hours, etc. Many recent or existing works adopt a data-driven approach and propose solutions with single-person perspectives, but do not address real-world problems as a result of natural crowd behavior, such as the Selfish Routing problem, which describes the consequence of ineffective network and sub-optimal social outcome by leaving agents to decide freely. In this work, we propose the Strategic and Crowd-Aware Itinerary Recommendation (SCAIR) algorithm which takes a game-theoretic approach to address the Selfish Routing problem and optimize social welfare in real-world situations. To address the NP-hardness of the social welfare optimization problem, we further propose a Markov Decision Process (MDP) approach which enables our simulations to be carried out in poly-time. We then use real-world data to evaluate the proposed algorithm, with benchmarks of two intuitive strategies commonly adopted in real life, and a recent algorithm published in the literature. Our simulation results highlight the existence of the Selfish Routing problem and show that SCAIR outperforms the benchmarks in handling this issue with real-world data.
Accelerating Column Generation via Flexible Dual Optimal Inequalities with Application to Entity Resolution
Lokhande, Vishnu Suresh, Wang, Shaofei, Singh, Maneesh, Yarkony, Julian
In this paper, we introduce a new optimization approach to Entity Resolution. Traditional approaches tackle entity resolution with hierarchical clustering, which does not benefit from a formal optimization formulation. In contrast, we model entity resolution as correlation-clustering, which we treat as a weighted set-packing problem and write as an integer linear program (ILP). In this case sources in the input data correspond to elements and entities in output data correspond to sets/clusters. We tackle optimization of weighted set packing by relaxing integrality in our ILP formulation. The set of potential sets/clusters can not be explicitly enumerated, thus motivating optimization via column generation. In addition to the novel formulation, we also introduce new dual optimal inequalities (DOI), that we call flexible dual optimal inequalities, which tightly lower-bound dual variables during optimization and accelerate column generation. We apply our formulation to entity resolution (also called de-duplication of records), and achieve state-of-the-art accuracy on two popular benchmark datasets.
Deep Declarative Networks: A New Hope
Gould, Stephen, Hartley, Richard, Campbell, Dylan
We introduce a new class of end-to-end learnable models wherein data processing nodes (or network layers) are defined in terms of desired behavior rather than an explicit forward function. Specifically, the forward function is implicitly defined as the solution to a mathematical optimization problem. Consistent with nomenclature in the programming languages community, we name our models deep declarative networks. Importantly, we show that the class of deep declarative networks subsumes current deep learning models. Moreover, invoking the implicit function theorem, we show how gradients can be back-propagated through declaratively defined data processing nodes thereby enabling end-to-end learning. We show how these declarative processing nodes can be implemented in the popular PyTorch deep learning software library allowing declarative and imperative nodes to co-exist within the same network. We provide numerous insights and illustrative examples of declarative nodes and demonstrate their application for image and point cloud classification tasks.
Understanding Deep Learning through Energy Landscapes
A convex optimization problem is a problem where all of the constraints are convex functions, and the objective is a convex function if minimizing, or a concave function if maximizing. A non-convex function "curves up and down" -- it is neither convex nor concave. However note that this function is convex from -pi to 0, and concave from 0 to pi. If the bounds on the variables restrict the domain of the objective and constraints to a region where the functions are convex, then the overall problem is convex. Linear functions are convex, so linear programming problems are convex problems.
Efficient nonmyopic Bayesian optimization and quadrature
Jiang, Shali, Chai, Henry, Gonzalez, Javier, Garnett, Roman
Finite-horizon sequential decision problems arise naturally in many machine learning contexts; examples include Bayesian optimization and Bayesian quadrature. Computing the optimal policy for such problems requires solving Bellman equations, which are generally intractable. Most existing work resorts to myopic approximations by limiting the horizon to only a single time-step, which can perform poorly in balancing exploration and exploitation. We propose a general framework for efficient, nonmyopic approximation of the optimal policy by drawing a connection between the optimal adaptive policy and its non-adaptive counterpart. Our proposal is to compute an optimal batch of points, then select a single point from within this batch to evaluate. We realize this idea for both Bayesian optimization and Bayesian quadrature and demonstrate that our proposed method significantly outperforms common myopic alternatives on a variety of tasks.
Boosting Classifiers with Noisy Inference
Kim, Yongjune, Cassuto, Yuval, Varshney, Lav R.
We present a principled framework to address resource allocation for realizing boosting algorithms on substrates with communication or computation noise. Boosting classifiers (e.g., AdaBoost) make a final decision via a weighted vote from the outputs of many base classifiers (weak classifiers). Suppose that the base classifiers' outputs are noisy or communicated over noisy channels; these noisy outputs will degrade the final classification accuracy. We show that this degradation can be effectively reduced by allocating more system resources for more important base classifiers. We formulate resource optimization problems in terms of importance metrics for boosting. Moreover, we show that the optimized noisy boosting classifiers can be more robust than bagging for the noise during inference (test stage). We provide numerical evidence to demonstrate the benefits of our approach.
Meta-Learning with Implicit Gradients
Rajeswaran, Aravind, Finn, Chelsea, Kakade, Sham, Levine, Sergey
A core capability of intelligent systems is the ability to quickly learn new tasks by drawing on prior experience. Gradient (or optimization) based meta-learning has recently emerged as an effective approach for few-shot learning. In this formulation, meta-parameters are learned in the outer loop, while task-specific models are learned in the inner-loop, by using only a small amount of data from the current task. A key challenge in scaling these approaches is the need to differentiate through the inner loop learning process, which can impose considerable computational and memory burdens. By drawing upon implicit differentiation, we develop the implicit MAML algorithm, which depends only on the solution to the inner level optimization and not the path taken by the inner loop optimizer. This effectively decouples the meta-gradient computation from the choice of inner loop optimizer. As a result, our approach is agnostic to the choice of inner loop optimizer and can gracefully handle many gradient steps without vanishing gradients or memory constraints. Theoretically, we prove that implicit MAML can compute accurate meta-gradients with a memory footprint that is, up to small constant factors, no more than that which is required to compute a single inner loop gradient and at no overall increase in the total computational cost. Experimentally, we show that these benefits of implicit MAML translate into empirical gains on few-shot image recognition benchmarks.
Algorithms for Optimal Diverse Matching
Ahmadi, Saba, Ahmed, Faez, Dickerson, John P., Fuge, Mark, Khuller, Samir
Bipartite b -matching, where agents on one side of a market are matched to one or more agents or items on the other, is a classical model that is used in myriad application areas such as healthcare, advertising, education, and genera l resource allocation. Traditionally, the primary goal of su ch models is to maximize a linear function of the constituent matches (e.g., linear social welfare maximization) subjec t to some constraints. Recent work has studied a new goal of balancing whole-match diversity and economic efficiency, where the objective is instead a monotone submodular function ove r the matching. These more general models are largely NPhard. In this work, we develop a combinatorial algorithm tha t constructs provably-optimal diverse b -matchings in pseudo-polynomial time. Then, we show how to extend our algorithm to solve new variations of the diverse b -matching problem. We then compare directly, on real-world datasets, against the state-of-the-art, quadratic-programming-based appr oach to solving diverse b -matching problems and show that our method outperforms it in both speed and (anytime) solution quality.
Cascade Size Distributions and Why They Matter
Burkholz, Rebekka, Quackenbush, John
How likely is it that a few initial node activations are amplified to produce large response cascades that span a considerable part of an entire network? Our answer to this question relies on the Independent Cascade Model for weighted directed networks. In using this model, most of our insights have been derived from the study of average effects. Here, we shift the focus on the full probability distribution of the final cascade size. This shift allows us to explore both typical cascade outcomes and improbable but relevant extreme events. We present an efficient message passing algorithm to compute the final cascade size distribution and activation probabilities of nodes conditional on the final cascade size. Our approach is exact on trees but can be applied to any network topology. It approximates locally treelike networks well and can lead to surprisingly good performance on more dense networks, as we show using real world data, including a miRNAmiRNA probabilistic interaction network for gastrointestinal cancer. We demonstrate the utility of our algorithms for clustering of nodes according to their functionality and influence maximization. Introduction The Independent Cascade Model (ICM) is a cornerstone in the study of spreading processes on networks. Many related optimization algorithms require sampling from the model.
Automatic Differentiation for Complex Valued SVD
Wan, Zhou-Quan, Zhang, Shi-Xin
Automatic differentiation(AD) evaluates derivatives or gradients of any functions specified by computer programs[ 1 ]. It is implemented by propagating derivatives of primitive operations v ia chain rules. Such approach is different from classical symbolic or numerical differentiations. Symbolic differentiat ion faces the challenge of converting a complicated computer program into expressions, while numerical differentiation faces the difficulty of numerical errors in the discretization. Besides, both symbolic and numerical methods have problems in calculating higher order derivatives and are also slow at computing gradients with respect to lots of input s variables, e.g. in the case for gradient-based optimization algorithms.