feasible direction
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Russia (0.04)
- (2 more...)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Russia (0.04)
- (2 more...)
What Data Enables Optimal Decisions? An Exact Characterization for Linear Optimization
Bennouna, Omar, Bennouna, Amine, Amin, Saurabh, Ozdaglar, Asuman
We study the fundamental question of how informative a dataset is for solving a given decision-making task. In our setting, the dataset provides partial information about unknown parameters that influence task outcomes. Focusing on linear programs, we characterize when a dataset is sufficient to recover an optimal decision, given an uncertainty set on the cost vector. Our main contribution is a sharp geometric characterization that identifies the directions of the cost vector that matter for optimality, relative to the task constraints and uncertainty set. We further develop a practical algorithm that, for a given task, constructs a minimal or least-costly sufficient dataset. Our results reveal that small, well-chosen datasets can often fully determine optimal decisions -- offering a principled foundation for task-aware data selection.
- North America > United States > New York > New York County > New York City (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > California > Los Angeles County > Santa Monica (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Reviews: Greedy Algorithms for Cone Constrained Optimization with Convergence Guarantees
The authors propose linear oracle based method to tackle such problems in the spirit of Frank-Wolfe algorithm and Matching Pursuit techniques. The main algorithm is the Non-Negative Matching Pursuit and the authors propose several active set variants. The paper contains convergence analysis for all the algorithms under different scenarios. In a nutshel, the convergence rate is sublinear for general objectives and linear for strongly convex objectives. The linear rates involve a new geometric quantity, the cone width. Finally the authors illustrate the relevance of their algorithm on several machine learning tasks and different datasets.
Optimization-based Causal Estimation from Heterogenous Environments
Yin, Mingzhang, Wang, Yixin, Blei, David M.
This paper presents a new optimization approach to causal estimation. Given data that contains covariates and an outcome, which covariates are causes of the outcome, and what is the strength of the causality? In classical machine learning (ML), the goal of optimization is to maximize predictive accuracy. However, some covariates might exhibit a non-causal association to the outcome. Such spurious associations provide predictive power for classical ML, but they prevent us from causally interpreting the result. This paper proposes CoCo, an optimization algorithm that bridges the gap between pure prediction and causal inference. CoCo leverages the recently-proposed idea of environments, datasets of covariates/response where the causal relationships remain invariant but where the distribution of the covariates changes from environment to environment. Given datasets from multiple environments -- and ones that exhibit sufficient heterogeneity -- CoCo maximizes an objective for which the only solution is the causal solution. We describe the theoretical foundations of this approach and demonstrate its effectiveness on simulated and real datasets. Compared to classical ML and existing methods, CoCo provides more accurate estimates of the causal model.
- North America > United States > New York (0.04)
- North America > United States > Michigan (0.04)
- North America > United States > Illinois (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Greedy Algorithms for Cone Constrained Optimization with Convergence Guarantees
Locatello, Francesco, Tschannen, Michael, Rätsch, Gunnar, Jaggi, Martin
Greedy optimization methods such as Matching Pursuit (MP) and Frank-Wolfe (FW) algorithms regained popularity in recent years due to their simplicity, effectiveness and theoretical guarantees. MP and FW address optimization over the linear span and the convex hull of a set of atoms, respectively. In this paper, we consider the intermediate case of optimization over the convex cone, parametrized as the conic hull of a generic atom set, leading to the first principled definitions of non-negative MP algorithms for which we give explicit convergence rates and demonstrate excellent empirical performance. In particular, we derive sublinear ($\mathcal{O}(1/t)$) convergence on general smooth and convex objectives, and linear convergence ($\mathcal{O}(e^{-t})$) on strongly convex objectives, in both cases for general sets of atoms. Furthermore, we establish a clear correspondence of our algorithms to known algorithms from the MP and FW literature. Our novel algorithms and analyses target general atom sets and general objective functions, and hence are directly applicable to a large variety of learning settings.
- Europe > Switzerland > Zürich > Zürich (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
On the Global Linear Convergence of Frank-Wolfe Optimization Variants
Lacoste-Julien, Simon, Jaggi, Martin
Martin Jaggi Dept. of Computer Science ETH Zürich, Switzerland The Frank-Wolfe (FW) optimization algorithm has lately regained popularity thanks in particular to its ability to nicely handle the structured constraints appearing in machine learning applications. However, its convergence rate is known to be slow (sublinear) when the solution lies at the boundary. A simple lessknown fix is to add the possibility to take'away steps' during optimization, an operation that importantly does not require a feasibility oracle. In this paper, we highlight and clarify several variants of the Frank-Wolfe optimization algorithm that have been successfully applied in practice: away-steps FW, pairwise FW, fully-corrective FW and Wolfe's minimum norm point algorithm, and prove for the first time that they all enjoy global linear convergence, under a weaker condition than strong convexity of the objective. The constant in the convergence rate has an elegant interpretation as the product of the (classical) condition number of the function with a novel geometric quantity that plays the role of a'condition number' of the constraint set. We provide pointers to where these algorithms have made a difference in practice, in particular with the flow polytope, the marginal polytope and the base polytope for submodular optimization. The Frank-Wolfe algorithm [9] (also known as conditional gradient) is one of the earliest existing methods for constrained convex optimization, and has seen an impressive revival recently due to its nice properties compared to projected or proximal gradient methods, in particular for sparse optimization and machine learning applications. On the other hand, the classical projected gradient and proximal methods have been known to exhibit a very nice adaptive acceleration property, namely that the the convergence rate becomes linear for strongly convex objective, i.e. that the optimization error of the same algorithm after t iterations will decrease geometrically with O((1 ρ)
- Europe > Switzerland > Zürich > Zürich (0.54)
- Europe > Russia (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- (2 more...)