Goto

Collaborating Authors

 Optimization


Sync-Rank: Robust Ranking, Constrained Ranking and Rank Aggregation via Eigenvector and Semidefinite Programming Synchronization

arXiv.org Machine Learning

We consider the classic problem of establishing a statistical ranking of a set of n items given a set of inconsistent and incomplete pairwise comparisons between such items. Instantiations of this problem occur in numerous applications in data analysis (e.g., ranking teams in sports data), computer vision, and machine learning. We formulate the above problem of ranking with incomplete noisy information as an instance of the group synchronization problem over the group SO(2) of planar rotations, whose usefulness has been demonstrated in numerous applications in recent years. Its least squares solution can be approximated by either a spectral or a semidefinite programming (SDP) relaxation, followed by a rounding procedure. We perform extensive numerical simulations on both synthetic and real-world data sets, showing that our proposed method compares favorably to other algorithms from the recent literature. Existing theoretical guarantees on the group synchronization problem imply lower bounds on the largest amount of noise permissible in the ranking data while still achieving exact recovery. We propose a similar synchronization-based algorithm for the rank-aggregation problem, which integrates in a globally consistent ranking pairwise comparisons given by different rating systems on the same set of items. We also discuss the problem of semi-supervised ranking when there is available information on the ground truth rank of a subset of players, and propose an algorithm based on SDP which recovers the ranks of the remaining players. Finally, synchronization-based ranking, combined with a spectral technique for the densest subgraph problem, allows one to extract locally-consistent partial rankings, in other words, to identify the rank of a small subset of players whose pairwise comparisons are less noisy than the rest of the data, which other methods are not able to identify.


Regression with Linear Factored Functions

arXiv.org Machine Learning

Many applications that use empirically estimated functions face a curse of dimensionality, because the integrals over most function classes must be approximated by sampling. This paper introduces a novel regression-algorithm that learns linear factored functions (LFF). This class of functions has structural properties that allow to analytically solve certain integrals and to calculate point-wise products. Applications like belief propagation and reinforcement learning can exploit these properties to break the curse and speed up computation. We derive a regularized greedy optimization scheme, that learns factored basis functions during training. The novel regression algorithm performs competitively to Gaussian processes on benchmark tasks, and the learned LFF functions are with 4-9 factored basis functions on average very compact.


Decentralized learning for wireless communications and networking

arXiv.org Machine Learning

This chapter deals with decentralized learning algorithms for in-network processing of graph-valued data. A generic learning problem is formulated and recast into a separable form, which is iteratively minimized using the alternating-direction method of multipliers (ADMM) so as to gain the desired degree of parallelization. Without exchanging elements from the distributed training sets and keeping inter-node communications at affordable levels, the local (per-node) learners consent to the desired quantity inferred globally, meaning the one obtained if the entire training data set were centrally available. Impact of the decentralized learning framework to contemporary wireless communications and networking tasks is illustrated through case studies including target tracking using wireless sensor networks, unveiling Internet traffic anomalies, power system state estimation, as well as spectrum cartography for wireless cognitive radio networks.


Robust Bayesian compressive sensing with data loss recovery for structural health monitoring signals

arXiv.org Machine Learning

The application of compressive sensing (CS) to structural health monitoring is an emerging research topic. The basic idea in CS is to use a specially-designed wireless sensor to sample signals that are sparse in some basis (e.g. wavelet basis) directly in a compressed form, and then to reconstruct (decompress) these signals accurately using some inversion algorithm after transmission to a central processing unit. However, most signals in structural health monitoring are only approximately sparse, i.e. only a relatively small number of the signal coefficients in some basis are significant, but the other coefficients are usually not exactly zero. In this case, perfect reconstruction from compressed measurements is not expected. A new Bayesian CS algorithm is proposed in which robust treatment of the uncertain parameters is explored, including integration over the prediction-error precision parameter to remove it as a "nuisance" parameter. The performance of the new CS algorithm is investigated using compressed data from accelerometers installed on a space-frame structure and on a cable-stayed bridge. Compared with other state-of-the-art CS methods including our previously-published Bayesian method which uses MAP (maximum a posteriori) estimation of the prediction-error precision parameter, the new algorithm shows superior performance in reconstruction robustness and posterior uncertainty quantification. Furthermore, our method can be utilized for recovery of lost data during wireless transmission, regardless of the level of sparseness in the signal.


On Gridless Sparse Methods for Line Spectral Estimation From Complete and Incomplete Data

arXiv.org Machine Learning

Abstract--This paper is concerned about sparse, continuous frequency estimation in line spectral estimation, and focused on developing gridless sparse methods which overcome grid mismatches and correspond to limiting scenarios of existing grid-based approaches, e.g., We generalize AST (atomic-norm soft thresholding) to the case of nonconsecutively sampled data (incomplete data) inspired by recent atomic norm based techniques. We present a gridless version of SPICE (gridless SPICE, or GLS), which is applicable to both complete and incomplete data without the knowledge of noise level. We further prove the equivalence between GLS and atomic norm-based techniques under different assumptions of noise. Moreover, we extend GLS to a systematic framework consisting of model order selection and robust frequency estimation, and present feasible algorithms for AST and GLS. Numerical simulations are provided to validate our theoretical analysis and demonstrate performance of our methods compared to existing ones. Spectral analysis of signals [1] is a major problem in statistical signal processing. In this paper we are concerned about the line spectral estimation problem which has wide applications in communications, radar, sonar, seismology, astronomy and so on. C is the measurement noise. The sinusoid numberK M, usually referred to as the model order, is typically unknown in practice. Following from [2], the case when the signal is observed on [M ] is referred to as the complete data case while the other case when only samples on Ω [M ] are available is called the incomplete data case (or missing data case), in which the samples on the complementary set of Ω, Ω, [M ]\ Ω, are called missing data. Manuscript November 2013; accepted by IEEE Transactions on Signal Processing March 2015. The authors are with the School of Electrical and Electronic Engineering, Nanyang Technological University, 639798, Singapore (email: { yangzai, elhxie } @ntu.edu.sg). Frequency estimation and model order selection are two important topics in line spectral estimation. 's can be obtained by a simple least-squares method according to (1). This paper is mainly focused on frequency estimation but we also incorporate existing model order selection tools in our methods. Many methods have been proposed for frequency estimation. Common classical methods include periodogram (or beamforming), nonlinear least squares (NLS) and MUSIC but often have limitations (see the review in [1]). For example, the periodogram suffers from leakage problems and have difficulties in resolving closely separated frequencies [1]. It is worth noting that the recent iterative adaptive approach (IAA) [4], [5] reduces the leakage of periodogram.


Stable Feature Selection from Brain sMRI

arXiv.org Machine Learning

Neuroimage analysis usually involves learning thousands or even millions of variables using only a limited number of samples. In this regard, sparse models, e.g. the lasso, are applied to select the optimal features and achieve high diagnosis accuracy. The lasso, however, usually results in independent unstable features. Stability, a manifest of reproducibility of statistical results subject to reasonable perturbations to data and the model (Yu 2013), is an important focus in statistics, especially in the analysis of high dimensional data. In this paper, we explore a nonnegative generalized fused lasso model for stable feature selection in the diagnosis of Alzheimer's disease. In addition to sparsity, our model incorporates two important pathological priors: the spatial cohesion of lesion voxels and the positive correlation between the features and the disease labels. To optimize the model, we propose an efficient algorithm by proving a novel link between total variation and fast network flow algorithms via conic duality. Experiments show that the proposed nonnegative model performs much better in exploring the intrinsic structure of data via selecting stable features compared with other state-of-the-arts.


A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming

arXiv.org Machine Learning

We propose a randomized nonmonotone block proximal gradient (RNBPG) method for minimizing the sum of a smooth (possibly nonconvex) function and a block-separable (possibly nonconvex nonsmooth) function. At each iteration, this method randomly picks a block according to any prescribed probability distribution and solves typically several associated proximal subproblems that usually have a closed-form solution, until a certain progress on objective value is achieved. In contrast to the usual randomized block coordinate descent method [23,20], our method has a nonmonotone flavor and uses variable stepsizes that can partially utilize the local curvature information of the smooth component of objective function. We show that any accumulation point of the solution sequence of the method is a stationary point of the problem {\it almost surely} and the method is capable of finding an approximate stationary point with high probability. We also establish a sublinear rate of convergence for the method in terms of the minimal expected squared norm of certain proximal gradients over the iterations. When the problem under consideration is convex, we show that the expected objective values generated by RNBPG converge to the optimal value of the problem. Under some assumptions, we further establish a sublinear and linear rate of convergence on the expected objective values generated by a monotone version of RNBPG. Finally, we conduct some preliminary experiments to test the performance of RNBPG on the $\ell_1$-regularized least-squares problem and a dual SVM problem in machine learning. The computational results demonstrate that our method substantially outperforms the randomized block coordinate {\it descent} method with fixed or variable stepsizes.


Differentiating the multipoint Expected Improvement for optimal batch design

arXiv.org Machine Learning

This work deals with parallel optimization of expensive objective functions which are modeled as sample realizations of Gaussian processes. The study is formalized as a Bayesian optimization problem, or continuous multi-armed bandit problem, where a batch of q \textgreater{} 0 arms is pulled in parallel at each iteration. Several algorithms have been developed for choosing batches by trading off exploitation and exploration. As of today, the maximum Expected Improvement (EI) and Upper Confidence Bound (UCB) selection rules appear as the most prominent approaches for batch selection. Here, we build upon recent work on the multipoint Expected Improvement criterion, for which an analytic expansion relying on Tallis' formula was recently established. The computational burden of this selection rule being still an issue in application, we derive a closed-form expression for the gradient of the multipoint Expected Improvement, which aims at facilitating its maximization using gradient-based ascent algorithms. Substantial computational savings are shown in application. In addition, our algorithms are tested numerically and compared to state-of-the-art UCB-based batch-sequential algorithms. Combining starting designs relying on UCB with gradient-based EI local optimization finally appears as a sound option for batch design in distributed Gaussian Process optimization.


The Knowledge Gradient Policy Using A Sparse Additive Belief Model

arXiv.org Machine Learning

We propose a sequential learning policy for noisy discrete global optimization and ranking and selection (R\&S) problems with high dimensional sparse belief functions, where there are hundreds or even thousands of features, but only a small portion of these features contain explanatory power. We aim to identify the sparsity pattern and select the best alternative before the finite budget is exhausted. We derive a knowledge gradient policy for sparse linear models (KGSpLin) with group Lasso penalty. This policy is a unique and novel hybrid of Bayesian R\&S with frequentist learning. Particularly, our method naturally combines B-spline basis expansion and generalizes to the nonparametric additive model (KGSpAM) and functional ANOVA model. Theoretically, we provide the estimation error bounds of the posterior mean estimate and the functional estimate. Controlled experiments show that the algorithm efficiently learns the correct set of nonzero parameters even when the model is imbedded with hundreds of dummy parameters. Also it outperforms the knowledge gradient for a linear model.


A warped kernel improving robustness in Bayesian optimization via random embeddings

arXiv.org Machine Learning

This works extends the Random Embedding Bayesian Optimization approach by integrating a warping of the high dimensional subspace within the covariance kernel. The proposed warping, that relies on elementary geometric considerations, allows mitigating the drawbacks of the high extrinsic dimensionality while avoiding the algorithm to evaluate points giving redundant information. It also alleviates constraints on bound selection for the embedded domain, thus improving the robustness, as illustrated with a test case with 25 variables and intrinsic dimension 6.