Optimization
Optimization for machine learning and monster trucks
Optimization for machine learning is essential to ensure that data mining models can learn from training data in order to generalize to future test data. Data mining models can have millions of parameters that depend on the training data and, in general, have no analytic definition. In such cases, effective models with good generalization capabilities can only be found by using optimization strategies. Optimization algorithms come in all shapes and sizes, just like anything in life. Attempting to create a single optimization algorithm for all problems would be as foolhardy as seeking to create a single motor vehicle for all drivers --there is a reason we have semi-trucks, automobiles, motorcycles, etc.
Probabilistic Forecasting and Simulation of Electricity Markets via Online Dictionary Learning
Deng, Weisi, Ji, Yuting, Tong, Lang
The problem of probabilistic forecasting and online simulation of real-time electricity market with stochastic generation and demand is considered. By exploiting the parametric structure of the direct current optimal power flow, a new technique based on online dictionary learning (ODL) is proposed. The ODL approach incorporates real-time measurements and historical traces to produce forecasts of joint and marginal probability distributions of future locational marginal prices, power flows, and dispatch levels, conditional on the system state at the time of forecasting. Compared with standard Monte Carlo simulation techniques, the ODL approach offers several orders of magnitude improvement in computation time, making it feasible for online forecasting of market operations. Numerical simulations on large and moderate size power systems illustrate its performance and complexity features and its potential as a tool for system operators.
Non-convex regularization in remote sensing
Tuia, Devis, Flamary, Remi, Barlaud, Michel
In this paper, we study the effect of different regularizers and their implications in high dimensional image classification and sparse linear unmixing. Although kernelization or sparse methods are globally accepted solutions for processing data in high dimensions, we present here a study on the impact of the form of regularization used and its parametrization. We consider regularization via traditional squared (2) and sparsity-promoting (1) norms, as well as more unconventional nonconvex regularizers (p and Log Sum Penalty). We compare their properties and advantages on several classification and linear unmixing tasks and provide advices on the choice of the best regularizer for the problem at hand. Finally, we also provide a fully functional toolbox for the community.
Towards stationary time-vertex signal processing
Perraudin, Nathanael, Loukas, Andreas, Grassi, Francesco, Vandergheynst, Pierre
Graph-based methods for signal processing have shown promise for the analysis of data exhibiting irregular structure, such as those found in social, transportation, and sensor networks. Yet, though these systems are often dynamic, state-of-the-art methods for signal processing on graphs ignore the dimension of time, treating successive graph signals independently or taking a global average. To address this shortcoming, this paper considers the statistical analysis of time-varying graph signals. We introduce a novel definition of joint (time-vertex) stationarity, which generalizes the classical definition of time stationarity and the more recent definition appropriate for graphs. Joint stationarity gives rise to a scalable Wiener optimization framework for joint denoising, semi-supervised learning, or more generally inversing a linear operator, that is provably optimal. Experimental results on real weather data demonstrate that taking into account graph and time dimensions jointly can yield significant accuracy improvements in the reconstruction effort.
Machine Learning As Prescriptive Analytics (IT Best Kept Secret Is Optimization)
I said, and I wrote, that machine learning and predictive analytics were almost the same. Of course, I also put optimization as the queen of all analytics technologies as it yields best business value. What else would you expect from someone who spent nearly 3 decades in working in optimization? No wonder this view became popular in the optimization community... First, let me reassure readers about my mental health: I still think that optimization is best for computing optimal decisions. I started thinking there was an issue when I met customers willing to use machine learning to solve all the business problems they have.
Approachability in unknown games: Online learning meets multi-objective optimization
Mannor, Shie, Perchet, Vianney, Stoltz, Gilles
In the standard setting of approachability there are two players and a target set. The players play repeatedly a known vector-valued game where the first player wants to have the average vector-valued payoff converge to the target set which the other player tries to exclude it from this set. We revisit this setting in the spirit of online learning and do not assume that the first player knows the game structure: she receives an arbitrary vector-valued reward vector at every round. She wishes to approach the smallest ("best") possible set given the observed average payoffs in hindsight. This extension of the standard setting has implications even when the original target set is not approachable and when it is not obvious which expansion of it should be approached instead. We show that it is impossible, in general, to approach the best target set in hindsight and propose achievable though ambitious alternative goals. We further propose a concrete strategy to approach these goals. Our method does not require projection onto a target set and amounts to switching between scalar regret minimization algorithms that are performed in episodes. Applications to global cost minimization and to approachability under sample path constraints are considered.
Pruning Random Forests for Prediction on a Budget
Nan, Feng, Wang, Joseph, Saligrama, Venkatesh
We propose to prune a random forest (RF) for resource-constrained prediction. We first construct a RF and then prune it to optimize expected feature cost & accuracy. We pose pruning RFs as a novel 0-1 integer program with linear constraints that encourages feature re-use. We establish total unimodularity of the constraint set to prove that the corresponding LP relaxation solves the original integer program. We then exploit connections to combinatorial optimization and develop an efficient primal-dual algorithm, scalable to large datasets. In contrast to our bottom-up approach, which benefits from good RF initialization, conventional methods are top-down acquiring features based on their utility value and is generally intractable, requiring heuristics. Empirically, our pruning algorithm outperforms existing state-of-the-art resource-constrained algorithms.
Collaborative Multi-sensor Classification via Sparsity-based Representation
Dao, Minh, Nguyen, Nam H., Nasrabadi, Nasser M., Tran, Trac D.
In this paper, we propose a general collaborative sparse representation framework for multi-sensor classification, which takes into account the correlations as well as complementary information between heterogeneous sensors simultaneously while considering joint sparsity within each sensor's observations. We also robustify our models to deal with the presence of sparse noise and low-rank interference signals. Specifically, we demonstrate that incorporating the noise or interference signal as a low-rank component in our models is essential in a multi-sensor classification problem when multiple co-located sources/sensors simultaneously record the same physical event. We further extend our frameworks to kernelized models which rely on sparsely representing a test sample in terms of all the training samples in a feature space induced by a kernel function. A fast and efficient algorithm based on alternative direction method is proposed where its convergence to an optimal solution is guaranteed. Extensive experiments are conducted on several real multi-sensor data sets and results are compared with the conventional classifiers to verify the effectiveness of the proposed methods.
Global Continuous Optimization with Error Bound and Fast Convergence
Kawaguchi, Kenji, Maruyama, Yu, Zheng, Xiaoyu
This paper considers global optimization with a black-box unknown objective function that can be non-convex and non-differentiable. Such a difficult optimization problem arises in many real-world applications, such as parameter tuning in machine learning, engineering design problem, and planning with a complex physics simulator. This paper proposes a new global optimization algorithm, called Locally Oriented Global Optimization (LOGO), to aim for both fast convergence in practice and finite-time error bound in theory. The advantage and usage of the new algorithm are illustrated via theoretical analysis and an experiment conducted with 11 benchmark test functions. Further, we modify the LOGO algorithm to specifically solve a planning problem via policy search with continuous state/action space and long time horizon while maintaining its finite-time error bound. We apply the proposed planning method to accident management of a nuclear power plant. The result of the application study demonstrates the practical utility of our method.
Machine Learning As Prescriptive Analytics (IT Best Kept Secret Is Optimization)
I said, and I wrote, that machine learning and predictive analytics were almost the same. Of course, I also put optimization as the queen of all analytics technologies as it yields best business value. What else would you expect from someone who spent nearly 3 decades in working in optimization? No wonder this view became popular in the optimization community... First, let me reassure readers about my mental health: I still think that optimization is best for computing optimal decisions. I started thinking there was an issue when I met customers willing to use machine learning to solve all the business problems they have.