Optimization
Compressive-Sensing Data Reconstruction for Structural Health Monitoring: A Machine-Learning Approach
Bao, Yuequan, Tang, Zhiyi, Li, Hui
Compressive sensing (CS) has been studied and applied in structural health monitoring for wireless data acquisition and transmission, structural modal identification, and spare damage identification. The key issue in CS is finding the optimal solution for sparse optimization. In the past years, many algorithms have been proposed in the field of applied mathematics. In this paper, we propose a machine-learning-based approach to solve the CS data-reconstruction problem. By treating a computation process as a data flow, the process of CS-based data reconstruction is formalized into a standard supervised-learning task. The prior knowledge, i.e., the basis matrix and the CS-sampled signals, are used as the input and the target of the network; the basis coefficient matrix is embedded as the parameters of a certain layer; the objective function of conventional compressive sensing is set as the loss function of the network. Regularized by l1-norm, these basis coefficients are optimized to reduce the error between the original CS-sampled signals and the masked reconstructed signals with a common optimization algorithm. Also, the proposed network can handle complex bases, such as a Fourier basis. Benefiting from the nature of a multi-neuron layer, multiple signal channels can be reconstructed simultaneously. Meanwhile, the disassembled use of a large-scale basis makes the method memory-efficient. A numerical example of multiple sinusoidal waves and an example of field-test wireless data from a suspension bridge are carried out to illustrate the data-reconstruction ability of the proposed approach. The results show that high reconstruction accuracy can be obtained by the machine learning-based approach. Also, the parameters of the network have clear meanings; the inference of the mapping between input and output is fully transparent, making the CS data reconstruction neural network interpretable.
Solving large-scale L1-regularized SVMs and cousins: the surprising effectiveness of column and constraint generation
Dedieu, Antoine, Mazumder, Rahul
The linear Support Vector Machine (SVM) is one of the most popular binary classification techniques in machine learning. Motivated by applications in modern high dimensional statistics, we consider penalized SVM problems involving the minimization of a hinge-loss function with a convex sparsity-inducing regularizer such as: the L1-norm on the coefficients, its grouped generalization and the sorted L1-penalty (aka Slope). Each problem can be expressed as a Linear Program (LP) and is computationally challenging when the number of features and/or samples is large -- the current state of algorithms for these problems is rather nascent when compared to the usual L2-regularized linear SVM. To this end, we propose new computational algorithms for these LPs by bringing together techniques from (a) classical column (and constraint) generation methods and (b) first order methods for non-smooth convex optimization - techniques that are rarely used together for solving large scale LPs. These components have their respective strengths; and while they are found to be useful as separate entities, they have not been used together in the context of solving large scale LPs such as the ones studied herein. Our approach complements the strengths of (a) and (b) --- leading to a scheme that seems to outperform commercial solvers as well as specialized implementations for these problems by orders of magnitude. We present numerical results on a series of real and synthetic datasets demonstrating the surprising effectiveness of classic column/constraint generation methods in the context of challenging LP-based machine learning tasks.
Using Well-Understood Single-Objective Functions in Multiobjective Black-Box Optimization Test Suites
Brockhoff, Dimo, Tusar, Tea, Auger, Anne, Hansen, Nikolaus
Several test function suites are being used for numerical benchmarking of multiobjective optimization algorithms. While they have some desirable properties, like well-understood Pareto sets and Pareto fronts of various shapes, most of the currently used functions possess characteristics that are arguably under-represented in real-world problems. They mainly stem from the easier construction of such functions and result in improbable properties such as separability, optima located exactly at the boundary constraints, and the existence of variables that solely control the distance between a solution and the Pareto front. Here, we propose an alternative way to constructing multiobjective problems-by combining existing single-objective problems from the literature. We describe in particular the bbob-biobj test suite with 55 bi-objective functions in continuous domain, and its extended version with 92 bi-objective functions (bbob-biobj-ext). Both test suites have been implemented in the COCO platform for black-box optimization benchmarking. Finally, we recommend a general procedure for creating test suites for an arbitrary number of objectives. Besides providing the formal function definitions and presenting their (known) properties, this paper also aims at giving the rationale behind our approach in terms of groups of functions with similar properties, objective space normalization, and problem instances. The latter allows us to easily compare the performance of deterministic and stochastic solvers, which is an often overlooked issue in benchmarking.
Algorithms by Jeff Erickson
Extended Dance Remix: These are notes on more advanced material directly related to the textbook. The notes are ordered roughly to match the textbook chapters. Fast Fourier Transforms (17 pages) Fast Exponential Algorithms (14 pages) Dynamic Programming for Formal Languages and Automata (7 pages, unfinished) Advanced Dynamic Programming (18 pages) Balances and Pseudoflows (13 pages) Minimum-Cost Flows (16 pages) Linear Programming (21 pages) Linear Programming Algorithms (18 pages) Approximation Algorithms (25 pages) Director's Cut: These are notes on topics not covered in the textbook. The numbering is completely independent to the textbook; I just started over at 1. We regularly cover some of the randomized algorithms material in CS 473, but I haven't used the amortized analysis or lower bounds notes in many years.
Text line Segmentation in Compressed Representation of Handwritten Document using Tunneling Algorithm
In this research work, we perform text line segmentation directly in compressed representation of an unconstrained handwritten document image. In this relation, we make use of text line terminal points which is the current state-of-the-art. The terminal points spotted along both margins (left and right) of a document image for every text line are considered as source and target respectively. The tunneling algorithm uses a single agent (or robot) to identify the coordinate positions in the compressed representation to perform text-line segmentation of the document. The agent starts at a source point and progressively tunnels a path routing in between two adjacent text lines and reaches the probable target. The agent's navigation path from source to the target bypassing obstacles, if any, results in segregating the two adjacent text lines. However, the target point would be known only when the agent reaches the destination; this is applicable for all source points and henceforth we could analyze the correspondence between source and target nodes. Artificial Intelligence in Expert systems, dynamic programming and greedy strategies are employed for every search space while tunneling. An exhaustive experimentation is carried out on various benchmark datasets including ICDAR13 and the performances are reported.
Cost-sensitive Selection of Variables by Ensemble of Model Sequences
Yan, Donghui, Qin, Zhiwei, Gu, Songxiang, Xu, Haiping, Shao, Ming
Many applications require the collection of data on different variables or measurements overa number of system performance metrics. For example, some cyber systems rely on scanning various system metrics to detect or to predict potential cyber intrusions or threats. In the maintenance of airplanes or major factorymachinery, measurements of different system components and their usage statistics are collected to determine when a maintenance is required. In medical diagnosis, a patient may be asked to take various medical tests, such 1 as on blood pressure, cholesterol level, heart rates and so on, so that the doctor coulddetermine if the patient has a certain disease. In the development of an e-commerce product that predicts the click or purchase of a product at an e-commerce website, many data related to a user's shopping behavior will be collected, and often extra data relevant to the product or the user's shopping behavior are purchased from a third-party vendor etc. The data collected on various measures need to be combined, and if cost is a concern, a subset of measures need to be selected to satisfy the budget constraint. The problem of combining measures for a target application can be formulated as follows.
Adaptive Quantile Low-Rank Matrix Factorization
Xu, Shuang, Zhang, Chun-Xia, Zhang, Jiangshe
Low-rank matrix factorization (LRMF) has received much popularity owing to its successful applications in both computer vision and data mining. By assuming the noise term to come from a Gaussian, Laplace or a mixture of Gaussian distributions, significant efforts have been made on optimizing the (weighted) $L_1$ or $L_2$-norm loss between an observed matrix and its bilinear factorization. However, the type of noise distribution is generally unknown in real applications and inappropriate assumptions will inevitably deteriorate the behavior of LRMF. On the other hand, real data are often corrupted by skew rather than symmetric noise. To tackle this problem, this paper presents a novel LRMF model called AQ-LRMF by modeling noise with a mixture of asymmetric Laplace distributions. An efficient algorithm based on the expectation-maximization (EM) algorithm is also offered to estimate the parameters involved in AQ-LRMF. The AQ-LRMF model possesses the advantage that it can approximate noise well no matter whether the real noise is symmetric or skew. The core idea of AQ-LRMF lies in solving a weighted $L_1$ problem with weights being learned from data. The experiments conducted with synthetic and real datasets show that AQ-LRMF outperforms several state-of-the-art techniques. Furthermore, AQ-LRMF also has the superiority over the other algorithms that it can capture local structural information contained in real images.
Global Geometry of Multichannel Sparse Blind Deconvolution on the Sphere
Multichannel blind deconvolution is the problem of recovering an unknown signal $f$ and multiple unknown channels $x_i$ from convolutional measurements $y_i=x_i \circledast f$ ($i=1,2,\dots,N$). We consider the case where the $x_i$'s are sparse, and convolution with $f$ is invertible. Our nonconvex optimization formulation solves for a filter $h$ on the unit sphere that produces sparse output $y_i\circledast h$. Under some technical assumptions, we show that all local minima of the objective function correspond to the inverse filter of $f$ up to an inherent sign and shift ambiguity, and all saddle points have strictly negative curvatures. This geometric structure allows successful recovery of $f$ and $x_i$ using a simple manifold gradient descent algorithm with random initialization. Our theoretical findings are complemented by numerical experiments, which demonstrate superior performance of the proposed approach over the previous methods.
Differentiable MPC for End-to-end Planning and Control
Amos, Brandon, Jimenez, Ivan, Sacks, Jacob, Boots, Byron, Kolter, J. Zico
We present foundations for using Model Predictive Control (MPC) as a differentiable policy class for reinforcement learning. This provides one way of leveraging and combining the advantages of model-free and model-based approaches. Specifically, we differentiate through MPC by using the KKT conditions of the convex approximation at a fixed point of the controller. Using this strategy, we are able to learn the cost and dynamics of a controller via end-to-end learning. Our experiments focus on imitation learning in the pendulum and cartpole domains, where we learn the cost and dynamics terms of an MPC policy class. We show that our MPC policies are significantly more data-efficient than a generic neural network and that our method is superior to traditional system identification in a setting where the expert is unrealizable.
Global Geometry of Multichannel Sparse Blind Deconvolution on the Sphere
Multichannel blind deconvolution is the problem of recovering an unknown signal $f$ and multiple unknown channels $x_i$ from convolutional measurements $y_i=x_i \circledast f$ ($i=1,2,\dots,N$). We consider the case where the $x_i$'s are sparse, and convolution with $f$ is invertible. Our nonconvex optimization formulation solves for a filter $h$ on the unit sphere that produces sparse output $y_i\circledast h$. Under some technical assumptions, we show that all local minima of the objective function correspond to the inverse filter of $f$ up to an inherent sign and shift ambiguity, and all saddle points have strictly negative curvatures. This geometric structure allows successful recovery of $f$ and $x_i$ using a simple manifold gradient descent algorithm with random initialization. Our theoretical findings are complemented by numerical experiments, which demonstrate superior performance of the proposed approach over the previous methods.