Optimization
Exploring High-Dimensional Structure via Axis-Aligned Decomposition of Linear Projections
Thiagarajan, Jayaraman J., Liu, Shusen, Ramamurthy, Karthikeyan Natesan, Bremer, Peer-Timo
Two-dimensional embeddings remain the dominant approach to visualize high dimensional data. The choice of embeddings ranges from highly non-linear ones, which can capture complex relationships but are difficult to interpret quantitatively, to axis-aligned projections, which are easy to interpret but are limited to bivariate relationships. Linear project can be considered as a compromise between complexity and interpretability, as they allow explicit axes labels, yet provide significantly more degrees of freedom compared to axis-aligned projections. Nevertheless, interpreting the axes directions, which are linear combinations often with many non-trivial components, remains difficult. To address this problem we introduce a structure aware decomposition of (multiple) linear projections into sparse sets of axis aligned projections, which jointly capture all information of the original linear ones. In particular, we use tools from Dempster-Shafer theory to formally define how relevant a given axis aligned project is to explain the neighborhood relations displayed in some linear projection. Furthermore, we introduce a new approach to discover a diverse set of high quality linear projections and show that in practice the information of $k$ linear projections is often jointly encoded in $\sim k$ axis aligned plots. We have integrated these ideas into an interactive visualization system that allows users to jointly browse both linear projections and their axis aligned representatives. Using a number of case studies we show how the resulting plots lead to more intuitive visualizations and new insight.
Snake: a Stochastic Proximal Gradient Algorithm for Regularized Problems over Large Graphs
Salim, Adil, Bianchi, Pascal, Hachem, Walid
A regularized optimization problem over a large unstructured graph is studied, where the regularization term is tied to the graph geometry. Typical regularization examples include the total variation and the Laplacian regularizations over the graph. When applying the proximal gradient algorithm to solve this problem, there exist quite affordable methods to implement the proximity operator (backward step) in the special case where the graph is a simple path without loops. In this paper, an algorithm, referred to as "Snake", is proposed to solve such regularized problems over general graphs, by taking benefit of these fast methods. The algorithm consists in properly selecting random simple paths in the graph and performing the proximal gradient algorithm over these simple paths. This algorithm is an instance of a new general stochastic proximal gradient algorithm, whose convergence is proven. Applications to trend filtering and graph inpainting are provided among others. Numerical experiments are conducted over large graphs.
Misspecified Nonconvex Statistical Optimization for Phase Retrieval
Yang, Zhuoran, Yang, Lin F., Fang, Ethan X., Zhao, Tuo, Wang, Zhaoran, Neykov, Matey
Existing nonconvex statistical optimization theory and methods crucially rely on the correct specification of the underlying "true" statistical models. To address this issue, we take a first step towards taming model misspecification by studying the high-dimensional sparse phase retrieval problem with misspecified link functions. In particular, we propose a simple variant of the thresholded Wirtinger flow algorithm that, given a proper initialization, linearly converges to an estimator with optimal statistical accuracy for a broad family of unknown link functions. We further provide extensive numerical experiments to support our theoretical findings.
Structured Optimal Transport
Alvarez-Melis, David, Jaakkola, Tommi S., Jegelka, Stefanie
Optimal Transport has recently gained interest in machine learning for applications ranging from domain adaptation, sentence similarities to deep learning. Yet, its ability to capture frequently occurring structure beyond the "ground metric" is limited. In this work, we develop a nonlinear generalization of (discrete) optimal transport that is able to reflect much additional structure. We demonstrate how to leverage the geometry of this new model for fast algorithms, and explore connections and properties. Illustrative experiments highlight the benefit of the induced structured couplings for tasks in domain adaptation and natural language processing.
A Spectral Approach for the Design of Experiments: Design, Analysis and Algorithms
Kailkhura, Bhavya, Thiagarajan, Jayaraman J., Rastogi, Charvi, Varshney, Pramod K., Bremer, Peer-Timo
This paper proposes a new approach to construct high quality space-filling sample designs. First, we propose a novel technique to quantify the space-filling property and optimally trade-off uniformity and randomness in sample designs in arbitrary dimensions. Second, we connect the proposed metric (defined in the spatial domain) to the objective measure of the design performance (defined in the spectral domain). This connection serves as an analytic framework for evaluating the qualitative properties of space-filling designs in general. Using the theoretical insights provided by this spatial-spectral analysis, we derive the notion of optimal space-filling designs, which we refer to as space-filling spectral designs. Third, we propose an efficient estimator to evaluate the space-filling properties of sample designs in arbitrary dimensions and use it to develop an optimization framework to generate high quality space-filling designs. Finally, we carry out a detailed performance comparison on two different applications in 2 to 6 dimensions: a) image reconstruction and b) surrogate modeling on several benchmark optimization functions and an inertial confinement fusion (ICF) simulation code. We demonstrate that the propose spectral designs significantly outperform existing approaches especially in high dimensions.
Continuous DR-submodular Maximization: Structure and Algorithms
Bian, An, Levy, Kfir Y., Krause, Andreas, Buhmann, Joachim M.
DR-submodular continuous functions are important objectives with wide real-world applications spanning MAP inference in determinantal point processes (DPPs), and mean-field inference for probabilistic submodular models, amongst others. DR-submodularity captures a subclass of non-convex functions that enables both exact minimization and approximate maximization in polynomial time. In this work we study the problem of maximizing non-monotone DR-submodular continuous functions under general down-closed convex constraints. We start by investigating geometric properties that underlie such objectives, e.g., a strong relation between (approximately) stationary points and global optimum is proved. These properties are then used to devise two optimization algorithms with provable guarantees. Concretely, we first devise a "two-phase" algorithm with $1/4$ approximation guarantee. This algorithm allows the use of existing methods for finding (approximately) stationary points as a subroutine, thus, harnessing recent progress in non-convex optimization. Then we present a non-monotone Frank-Wolfe variant with $1/e$ approximation guarantee and sublinear convergence rate. Finally, we extend our approach to a broader class of generalized DR-submodular continuous functions, which captures a wider spectrum of applications. Our theoretical findings are validated on synthetic and real-world problem instances.
Catalyst Acceleration for First-order Convex Optimization: from Theory to Practice
Lin, Hongzhou, Mairal, Julien, Harchaoui, Zaid
We introduce a generic scheme for accelerating gradient-based optimization methods in the sense of Nesterov. The approach, called Catalyst, builds upon the inexact acceler- ated proximal point algorithm for minimizing a convex objective function, and consists of approximately solving a sequence of well-chosen auxiliary problems, leading to faster convergence. One of the key to achieve acceleration in theory and in practice is to solve these sub-problems with appropriate accuracy by using the right stopping criterion and the right warm-start strategy. In this paper, we give practical guidelines to use Catalyst and present a comprehensive theoretical analysis of its global complexity. We show that Catalyst applies to a large class of algorithms, including gradient descent, block coordinate descent, incremental algorithms such as SAG, SAGA, SDCA, SVRG, Finito/MISO, and their proximal variants. For all of these methods, we provide acceleration and explicit sup- port for non-strongly convex objectives. We conclude with extensive experiments showing that acceleration is useful in practice, especially for ill-conditioned problems.
AI: Helping Simplify Optimal Decisions - DZone AI
In the business world, there are many factors to consider when making the optimal decision. There are so many data points to consider that it becomes a combinatorial problem. For example, consider when and how to raise room rates across a hotel chain based on locations and current events or how best to optimize airline ticket prices given fluctuating fuel costs, factoring in seasonal conditions and local and/or global events. This flows over into our social and personal lives, as we rightly expect to find the nearest coffee shops located to the nearest public libraries or where to buy the cheapest gas closest to the supermarket that stocks the groceries we need. Decision optimization (DO) is the prescriptive element of the data science lifecycle and is key to delivering artificial intelligence, as machine learning (ML) and DO have somewhat of a symbiotic relationship.
Improved Linear Embeddings via Lagrange Duality
Sheth, Kshiteej, Garg, Dinesh, Dasgupta, Anirban
Near isometric orthogonal embeddings to lower dimensions are a fundamental tool in data science and machine learning. In this paper, we present the construction of such embeddings that minimizes the maximum distortion for a given set of points. We formulate the problem as a non convex constrained optimization problem. We first construct a primal relaxation and then use the theory of Lagrange duality to create dual relaxation. We also suggest a polynomial time algorithm based on the theory of convex optimization to solve the dual relaxation provably. We provide a theoretical upper bound on the approximation guarantees for our algorithm, which depends only on the spectral properties of the dataset. We experimentally demonstrate the superiority of our algorithm compared to baselines in terms of the scalability and the ability to achieve lower distortion.
Online Master of Science in Business Analytics - Business Analytics @ Tepper
The Tepper School of Business developed the curriculum for the online Master of Science in Business Analytics (MSBA) program from the ground up with this question in mind. In consultation with global business leaders, they determined that the greatest need is for professionals who not only have advanced analytical skills, such as machine learning and optimization, but also the appropriate business knowledge and communication skills to solve complex problems and bring value to industry. Our students develop proficiency in the full range of state-of-the-art business analytics techniques; they also learn how to tell stories through and extract insights from data. Given the Tepper School's view of a curriculum as an organic entity, our faculty continually work in concert to ensure that courses harmonize, even as they are individually updated and modified to ensure learning outcomes for students are always in step with an ever-evolving industry. The flexible online format enables students to continue working while earning their degree and apply what they learn in the classroom to their work environment.