Slawski, Martin, Hein, Matthias

Non-negative data are commonly encountered in numerous fields, making non-negative least squares regression (NNLS) a frequently used tool. At least relative to its simplicity, it often performs rather well in practice. Serious doubts about its usefulness arise for modern high-dimensional linear models. Even in this setting - unlike first intuition may suggest - we show that for a broad class of designs, NNLS is resistant to overfitting and works excellently for sparse recovery when combined with thresholding, experimentally even outperforming L1-regularization. Since NNLS also circumvents the delicate choice of a regularization parameter, our findings suggest that NNLS may be the method of choice.

Random non-linear Fourier features have recently shown remarkable performance in a wide-range of regression and classification applications. Motivated by this success, this article focuses on a sparse non-linear Fourier feature (NFF) model. We provide a characterization of the sufficient number of data points that guarantee perfect recovery of the unknown parameters with high-probability. In particular, we show how the sufficient number of data points depends on the kernel matrix associated with the probability distribution function of the input data. We compare our results with the recoverability bounds for the bounded orthonormal systems and provide examples that illustrate sparse recovery under the NFF model.

We propose a new algorithm for hyperparameter selection in machine learning algorithms. The algorithm is a novel modification of Harmonica, a spectral hyperparameter selection approach using sparse recovery methods. In particular, we show that a special encoding of hyperparameter space enables a natural group-sparse recovery formulation, which when coupled with HyperBand (a multi-armed bandit strategy) leads to improvement over existing hyperparameter optimization methods such as Successive Halving and Random Search. Experimental results on image datasets such as CIFAR-10 confirm the benefits of our approach.

Liu, Luoluo, Chin, Sang Peter, Tran, Trac D.

Bagging, a powerful ensemble method from machine learning, improves the performance of unstable predictors. Although the power of Bagging has been shown mostly in classification problems, we demonstrate the success of employing Bagging in sparse regression over the baseline method (L1 minimization). The framework employs the generalized version of the original Bagging with various bootstrap ratios. The performance limits associated with different choices of bootstrap sampling ratio L/m and number of estimates K is analyzed theoretically. Simulation shows that the proposed method yields state-of-the-art recovery performance, outperforming L1 minimization and Bolasso in the challenging case of low levels of measurements. A lower L/m ratio (60% - 90%) leads to better performance, especially with a small number of measurements. With the reduced sampling rate, SNR improves over the original Bagging by up to 24%. With a properly chosen sampling ratio, a reasonably small number of estimates K = 30 gives satisfying result, even though increasing K is discovered to always improve or at least maintain the performance.

Liu, Luoluo, Chin, Sang Peter, Tran, Trac D.

Classical signal recovery based on $\ell_1$ minimization solves the least squares problem with all available measurements via sparsity-promoting regularization. In practice, it is often the case that not all measurements are available or required for recovery. Measurements might be corrupted/missing or they arrive sequentially in streaming fashion. In this paper, we propose a global sparse recovery strategy based on subsets of measurements, named JOBS, in which multiple measurements vectors are generated from the original pool of measurements via bootstrapping, and then a joint-sparse constraint is enforced to ensure support consistency among multiple predictors. The final estimate is obtained by averaging over the $K$ predictors. The performance limits associated with different choices of number of bootstrap samples $L$ and number of estimates $K$ is analyzed theoretically. Simulation results validate some of the theoretical analysis, and show that the proposed method yields state-of-the-art recovery performance, outperforming $\ell_1$ minimization and a few other existing bootstrap-based techniques in the challenging case of low levels of measurements and is preferable over other bagging-based methods in the streaming setting since it performs better with small $K$ and $L$ for data-sets with large sizes.

Ranjan, Shashank, Vidyasagar, Mathukumalli

In this paper, we study the problem of recovering a group sparse vector from a small number of linear measurements. In the past the common approach has been to use various "group sparsity-inducing" norms such as the Group LASSO norm for this purpose. By using the theory of convex relaxations, we show that it is also possible to use $\ell_1$-norm minimization for group sparse recovery. We introduce a new concept called group robust null space property (GRNSP), and show that, under suitable conditions, a group version of the restricted isometry property (GRIP) implies the GRNSP, and thus leads to group sparse recovery. When all groups are of equal size, our bounds are less conservative than known bounds. Moreover, our results apply even to situations where where the groups have different sizes. When specialized to conventional sparsity, our bounds reduce to one of the well-known "best possible" conditions for sparse recovery. This relationship between GRNSP and GRIP is new even for conventional sparsity, and substantially streamlines the proofs of some known results. Using this relationship, we derive bounds on the $\ell_p$-norm of the residual error vector for all $p \in [1,2]$, and not just when $p = 2$. When the measurement matrix consists of random samples of a sub-Gaussian random variable, we present bounds on the number of measurements, which are less conservative than currently known bounds.

Javaheri, Amirhossein, Zayyani, Hadi, Figueiredo, Mario A. T., Marvasti, Farrokh

This paper investigates the problem of sparse signal recovery in the presence of additive impulsive noise. The heavytailed impulsive noise is well modelled with stable distributions. Since there is no explicit formulation for the probability density function of $S\alpha S$ distribution, alternative approximations like Generalized Gaussian Distribution (GGD) are used which impose $\ell_p$-norm fidelity on the residual error. In this paper, we exploit a Continuous Mixed Norm (CMN) for robust sparse recovery instead of $\ell_p$-norm. We show that in blind conditions, i.e., in case where the parameters of noise distribution are unknown, incorporating CMN can lead to near optimal recovery. We apply Alternating Direction Method of Multipliers (ADMM) for solving the problem induced by utilizing CMN for robust sparse recovery. In this approach, CMN is replaced with a surrogate function and Majorization-Minimization technique is incorporated to solve the problem. Simulation results confirm the efficiency of the proposed method compared to some recent algorithms in the literature for impulsive noise robust sparse recovery.

Javaheri, Amirhossein, Zayyani, Hadi, Marvasti, Farokh

This paper investigates the problem of recovering missing samples using methods based on sparse representation adapted especially for image signals. Instead of $l_2$-norm or Mean Square Error (MSE), a new perceptual quality measure is used as the similarity criterion between the original and the reconstructed images. The proposed criterion called Convex SIMilarity (CSIM) index is a modified version of the Structural SIMilarity (SSIM) index, which despite its predecessor, is convex and uni-modal. We derive mathematical properties for the proposed index and show how to optimally choose the parameters of the proposed criterion, investigating the Restricted Isometry (RIP) and error-sensitivity properties. We also propose an iterative sparse recovery method based on a constrained $l_1$-norm minimization problem, incorporating CSIM as the fidelity criterion. The resulting convex optimization problem is solved via an algorithm based on Alternating Direction Method of Multipliers (ADMM). Taking advantage of the convexity of the CSIM index, we also prove the convergence of the algorithm to the globally optimal solution of the proposed optimization problem, starting from any arbitrary point. Simulation results confirm the performance of the new similarity index as well as the proposed algorithm for missing sample recovery of image patch signals.

Limmer, Steffen, Stanczak, Slawomir

This paper introduces Laplace techniques for designing a neural network, with the goal of estimating simplex-constraint sparse vectors from compressed measurements. To this end, we recast the problem of MMSE estimation (w.r.t. a pre-defined uniform input distribution) as the problem of computing the centroid of some polytope that results from the intersection of the simplex and an affine subspace determined by the measurements. Owing to the specific structure, it is shown that the centroid can be computed analytically by extending a recent result that facilitates the volume computation of polytopes via Laplace transformations. A main insight of this paper is that the desired volume and centroid computations can be performed by a classical deep neural network comprising threshold functions, rectified linear (ReLU) and rectified polynomial (ReP) activation functions. The proposed construction of a deep neural network for sparse recovery is completely analytic so that time-consuming training procedures are not necessary. Furthermore, we show that the number of layers in our construction is equal to the number of measurements which might enable novel low-latency sparse recovery algorithms for a larger class of signals than that assumed in this paper. To assess the applicability of the proposed uniform input distribution, we showcase the recovery performance on samples that are soft-classification vectors generated by two standard datasets. As both volume and centroid computation are known to be computationally hard, the network width grows exponentially in the worst-case. It can be, however, decreased by inducing sparse connectivity in the neural network via a well-suited basis of the affine subspace. Finally, the presented analytical construction may serve as a viable initialization to be further optimized and trained using particular input datasets at hand.

Soltani, Mohammadreza, Hegde, Chinmay

Random sinusoidal features are a popular approach for speeding up kernel-based inference in large datasets. Prior to the inference stage, the approach suggests performing dimensionality reduction by first multiplying each data vector by a random Gaussian matrix, and then computing an element-wise sinusoid. Theoretical analysis shows that collecting a sufficient number of such features can be reliably used for subsequent inference in kernel classification and regression. In this work, we demonstrate that with a mild increase in the dimension of the embedding, it is also possible to reconstruct the data vector from such random sinusoidal features, provided that the underlying data is sparse enough. In particular, we propose a numerically stable algorithm for reconstructing the data vector given the nonlinear features, and analyze its sample complexity. Our algorithm can be extended to other types of structured inverse problems, such as demixing a pair of sparse (but incoherent) vectors. We support the efficacy of our approach via numerical experiments.