Goto

Collaborating Authors

 Optimization


Searching for Gas Turbine Maintenance Schedules

AI Magazine

Preventive maintenance schedules occurring in industry are often suboptimal with regard to maintenance coal-location, loss-of-production costs and availability. We describe the implementation and deployment of a software decision support tool for the maintenance planning of gas turbines, with the goal of reducing the direct maintenance costs and the often costly production losses during maintenance downtime. The optimization problem is formally defined, and we argue that the feasibility version is NP-complete. We outline a heuristic algorithm that can quickly solve the problem for practical purposes and validate the approach on a real-world scenario based on an oil production facility. We also compare the performance of our algorithm with results from using integer programming, and discuss the deployment of the application. The experimental results indicate that downtime reductions up to 65% can be achieved, compared to traditional preventive maintenance. In addition, the use of our tool is expected to improve availability with up to 1% and reduce the number of planned maintenance days by 12%. Compared to a integer programming approach, our algorithm is not optimal, but is much faster and produces results which are useful in practice. Our test results and SIT AB’s estimates based< on operational use both indicate that significant savings can be achieved by using our software tool, compared to maintenance plans with fixed intervals.


A Unifying View of Multiple Kernel Learning

arXiv.org Machine Learning

Recent research on multiple kernel learning has lead to a number of approaches for combining kernels in regularized risk minimization. The proposed approaches include different formulations of objectives and varying regularization strategies. In this paper we present a unifying general optimization criterion for multiple kernel learning and show how existing formulations are subsumed as special cases. We also derive the criterion's dual representation, which is suitable for general smooth optimization algorithms. Finally, we evaluate multiple kernel learning in this framework analytically using a Rademacher complexity bound on the generalization error and empirically in a set of experiments.


Joint Structured Models for Extraction from Overlapping Sources

arXiv.org Artificial Intelligence

We consider the problem of jointly training structured models for extraction from sources whose instances enjoy partial overlap. This has important applications like user-driven ad-hoc information extraction on the web. Such applications present new challenges in terms of the number of sources and their arbitrary pattern of overlap not seen by earlier collective training schemes applied on two sources. We present an agreement-based learning framework and alternatives within it to trade-off tractability, robustness to noise, and extent of agreement. We provide a principled scheme to discover low-noise agreement sets in unlabeled data across the sources. Through extensive experiments over 58 real datasets, we establish that our method of additively rewarding agreement over maximal segments of text provides the best trade-offs, and also scores over alternatives such as collective inference, staged training, and multi-view learning.


Prototype Optimization for Temporarily and Spatially Distorted Time Series

AAAI Conferences

An important issue in time series classification problems is to find representative prototypes. Especially for roughly segmented time series with spatial distortions, such as human gestures, it is complicated to find templates, which optimally represent signal classes. In this paper we present an approach to find optimal time series prototypes in subseries of class templates. Our optimization approach is based on separability measures for prototype candidates and utilizes (but is not limited to) DTW in order to tackle the problem of spatial and temporal distortions. The search for prototypes in the target space is performed by means of a brute force search as well as an evolution strategy. In our experiments with an artificial dataset we show that brute force search optimization is able to improve the time series classification result and that the application of an evolution strategy yields comparable target function scores while reducing computing time.


Interactive Cost Configuration Over Decision Diagrams

Journal of Artificial Intelligence Research

In many AI domains such as product configuration, a user should interactively specify a solution that must satisfy a set of constraints. In such scenarios, offline compilation of feasible solutions into a tractable representation is an important approach to delivering efficient backtrack-free user interaction online. In particular,binary decision diagrams (BDDs) have been successfully used as a compilation target for product and service configuration. In this paper we discuss how to extend BDD-based configuration to scenarios involving cost functions which express user preferences. We first show that an efficient, robust and easy to implement extension is possible if the cost function is additive, and feasible solutions are represented using multi-valued decision diagrams (MDDs). We also discuss the effect on MDD size if the cost function is non-additive or if it is encoded explicitly into MDD. We then discuss interactive configuration in the presence of multiple cost functions. We prove that even in its simplest form, multiple-cost configuration is NP-hard in the input MDD. However, for solving two-cost configuration we develop a pseudo-polynomial scheme and a fully polynomial approximation scheme. The applicability of our approach is demonstrated through experiments over real-world configuration models and product-catalogue datasets. Response times are generally within a fraction of a second even for very large instances.


A New Understanding of Prediction Markets Via No-Regret Learning

arXiv.org Artificial Intelligence

We explore the striking mathematical connections that exist between market scoring rules, cost function based prediction markets, and no-regret learning. We show that any cost function based prediction market can be interpreted as an algorithm for the commonly studied problem of learning from expert advice by equating trades made in the market with losses observed by the learning algorithm. If the loss of the market organizer is bounded, this bound can be used to derive an O(sqrt(T)) regret bound for the corresponding learning algorithm. We then show that the class of markets with convex cost functions exactly corresponds to the class of Follow the Regularized Leader learning algorithms, with the choice of a cost function in the market corresponding to the choice of a regularizer in the learning problem. Finally, we show an equivalence between market scoring rules and prediction markets with convex cost functions. This implies that market scoring rules can also be interpreted naturally as Follow the Regularized Leader algorithms, and may be of independent interest. These connections provide new insight into how it is that commonly studied markets, such as the Logarithmic Market Scoring Rule, can aggregate opinions into accurate estimates of the likelihood of future events.


A Quasi-Newton Approach to Nonsmooth Convex Optimization Problems in Machine Learning

arXiv.org Machine Learning

We extend the well-known BFGS quasi-Newton method and its memory-limited variant LBFGS to the optimization of nonsmooth convex objectives. This is done in a rigorous fashion by generalizing three components of BFGS to subdifferentials: the local quadratic model, the identification of a descent direction, and the Wolfe line search conditions. We prove that under some technical conditions, the resulting subBFGS algorithm is globally convergent in objective function value. We apply its memory-limited variant (subLBFGS) to L_2-regularized risk minimization with the binary hinge loss. To extend our algorithm to the multiclass and multilabel settings, we develop a new, efficient, exact line search algorithm. We prove its worst-case time complexity bounds, and show that our line search can also be used to extend a recently developed bundle method to the multiclass and multilabel settings. We also apply the direction-finding component of our algorithm to L_1-regularized risk minimization with logistic loss. In all these contexts our methods perform comparable to or better than specialized state-of-the-art solvers on a number of publicly available datasets. An open source implementation of our algorithms is freely available.


Mechanisms for Multi-Unit Auctions

Journal of Artificial Intelligence Research

We present an incentive-compatible polynomial-time approximation scheme for multi-unit auctions with general k-minded player valuations. The mechanism fully optimizes over an appropriately chosen sub-range of possible allocations and then uses VCG payments over this sub-range. We show that obtaining a fully polynomial-time incentive-compatible approximation scheme, at least using VCG payments, is NP-hard. For the case of valuations given by black boxes, we give a polynomial-time incentive-compatible 2-approximation mechanism and show that no better is possible, at least using VCG payments.


Conditional Neural Fields

Neural Information Processing Systems

Conditional random fields (CRF) are quite successful on sequence labeling tasks such as natural language processing and biological sequence analysis. CRF models use linear potential functions to represent the relationship between input features and outputs. However, in many real-world applications such as protein structure prediction and handwriting recognition, the relationship between input features and outputs is highly complex and nonlinear, which cannot be accurately modeled by a linear function. To model the nonlinear relationship between input features and outputs we propose Conditional Neural Fields (CNF), a new conditional probabilistic graphical model for sequence labeling. Our CNF model extends CRF by adding one (or possibly several) middle layer between input features and outputs. The middle layer consists of a number of hidden parameterized gates, each acting as a local neural network node or feature extractor to capture the nonlinear relationship between input features and outputs. Therefore, conceptually this CNF model is much more expressive than the linear CRF model. To better control the complexity of the CNF model, we also present a hyperparameter optimization procedure within the evidence framework. Experiments on two widely-used benchmarks indicate that this CNF model performs significantly better than a number of popular methods. In particular, our CNF model is the best among about ten machine learning methods for protein secondary tructure prediction and also among a few of the best methods for handwriting recognition.


An LP View of the M-best MAP problem

Neural Information Processing Systems

We consider the problem of finding the M assignments with maximum probability in a probabilistic graphical model. We show how this problem can be formulated as a linear program (LP) on a particular polytope. We prove that, for tree graphs (and junction trees in general), this polytope has a particularly simple form and differs from the marginal polytope in a single inequality constraint. We use this characterization to provide an approximation scheme for non-tree graphs, by using the set of spanning trees over such graphs. The method we present puts the M-best inference problem in the context of LP relaxations, which have recently received considerable attention and have proven useful in solving difficult inference problems.