Goto

Collaborating Authors

 Statistical Learning


A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming

arXiv.org Machine Learning

We propose a randomized nonmonotone block proximal gradient (RNBPG) method for minimizing the sum of a smooth (possibly nonconvex) function and a block-separable (possibly nonconvex nonsmooth) function. At each iteration, this method randomly picks a block according to any prescribed probability distribution and solves typically several associated proximal subproblems that usually have a closed-form solution, until a certain progress on objective value is achieved. In contrast to the usual randomized block coordinate descent method [23,20], our method has a nonmonotone flavor and uses variable stepsizes that can partially utilize the local curvature information of the smooth component of objective function. We show that any accumulation point of the solution sequence of the method is a stationary point of the problem {\it almost surely} and the method is capable of finding an approximate stationary point with high probability. We also establish a sublinear rate of convergence for the method in terms of the minimal expected squared norm of certain proximal gradients over the iterations. When the problem under consideration is convex, we show that the expected objective values generated by RNBPG converge to the optimal value of the problem. Under some assumptions, we further establish a sublinear and linear rate of convergence on the expected objective values generated by a monotone version of RNBPG. Finally, we conduct some preliminary experiments to test the performance of RNBPG on the $\ell_1$-regularized least-squares problem and a dual SVM problem in machine learning. The computational results demonstrate that our method substantially outperforms the randomized block coordinate {\it descent} method with fixed or variable stepsizes.


Non-parametric Bayesian Models of Response Function in Dynamic Image Sequences

arXiv.org Machine Learning

Estimation of response functions is an important task in dynamic medical imaging. This task arises for example in dynamic renal scintigraphy, where impulse response or retention functions are estimated, or in functional magnetic resonance imaging where hemodynamic response functions are required. These functions can not be observed directly and their estimation is complicated because the recorded images are subject to superposition of underlying signals. Therefore, the response functions are estimated via blind source separation and deconvolution. Performance of this algorithm heavily depends on the used models of the response functions. Response functions in real image sequences are rather complicated and finding a suitable parametric form is problematic. In this paper, we study estimation of the response functions using non-parametric Bayesian priors. These priors were designed to favor desirable properties of the functions, such as sparsity or smoothness. These assumptions are used within hierarchical priors of the blind source separation and deconvolution algorithm. Comparison of the resulting algorithms with these priors is performed on synthetic dataset as well as on real datasets from dynamic renal scintigraphy. It is shown that flexible non-parametric priors improve estimation of response functions in both cases. MATLAB implementation of the resulting algorithms is freely available for download.


MIST: L0 Sparse Linear Regression with Momentum

arXiv.org Machine Learning

In the current age of big data acquisition there has been an ever growing interest in sparse representations, which consists of representing, say, a noisy signal as a linear combination of very few components. This implies that the entire information in the signal can be approximately captured by a small number of components, which has huge benefits in analysis, processing and storage of high dimensional signals. As a result, sparse linear regression has been widely studied with many applications in signal and image processing, statistical inference and machine learning. Specific applications include compressed sensing, denoising, inpainting, deblurring, source separation, sparse image reconstruction, and signal classification, etc.


The Knowledge Gradient Policy Using A Sparse Additive Belief Model

arXiv.org Machine Learning

We propose a sequential learning policy for noisy discrete global optimization and ranking and selection (R\&S) problems with high dimensional sparse belief functions, where there are hundreds or even thousands of features, but only a small portion of these features contain explanatory power. We aim to identify the sparsity pattern and select the best alternative before the finite budget is exhausted. We derive a knowledge gradient policy for sparse linear models (KGSpLin) with group Lasso penalty. This policy is a unique and novel hybrid of Bayesian R\&S with frequentist learning. Particularly, our method naturally combines B-spline basis expansion and generalizes to the nonparametric additive model (KGSpAM) and functional ANOVA model. Theoretically, we provide the estimation error bounds of the posterior mean estimate and the functional estimate. Controlled experiments show that the algorithm efficiently learns the correct set of nonzero parameters even when the model is imbedded with hundreds of dummy parameters. Also it outperforms the knowledge gradient for a linear model.


Shared latent subspace modelling within Gaussian-Binary Restricted Boltzmann Machines for NIST i-Vector Challenge 2014

arXiv.org Machine Learning

This paper presents a novel approach to speaker subspace modelling based on Gaussian-Binary Restricted Boltzmann Machines (GRBM). The proposed model is based on the idea of shared factors as in the Probabilistic Linear Discriminant Analysis (PLDA). GRBM hidden layer is divided into speaker and channel factors, herein the speaker factor is shared over all vectors of the speaker. Then Maximum Likelihood Parameter Estimation (MLE) for proposed model is introduced. Various new scoring techniques for speaker verification using GRBM are proposed. The results for NIST i-vector Challenge 2014 dataset are presented.


Nonparametric Nearest Neighbor Descent Clustering based on Delaunay Triangulation

arXiv.org Machine Learning

Abstract: In our physically inspired in-tree (IT) based clustering algorithm and the series after it, there is only one free parameter involved in computing the potential value of each point. In this work, based on the Delaunay Triangulation or its dual Voronoi tessellation, we propose a nonparametric process to compute potential values by the local information. This computation, though nonparametric, is relatively very rough, and consequently, many local extreme points will be generated. However, unlike those gradient-based methods, our ITbased methods are generally insensitive to those local extremes. This positively demonstrates the superiority of these parametric (previous) and nonparametric (in this work) ITbased methods. 1 Introduction In (1), we proposed a physically inspired clustering algorithm, in which an in-tree (IT) structure was first constructed. This IT structure organizes the data points into the clusters with several undesired connections (edges) between them requiring to be removed.


IT-map: an Effective Nonlinear Dimensionality Reduction Method for Interactive Clustering

arXiv.org Machine Learning

In our previous works (1, 2), we have shown its potential in cluster analysis. Combinations of the IT structure with the Semi-Supervised learning concept (3), Rodriguez and Laio's "Decision Graph" (4), and Frey and Dueck's "Affinity Propagation" (AP) (5), have resulted in effective cluster analysis methods. For example, based on the IT structure, the application scope of AP was extended from spherical to nonspherical cluster detection (2). In this paper, we will show another potential of the IT structure: nonlinear dimensionality reduction, for which an effective combination is made with the "isometric mapping" (Isomap) proposed by Tenenbaum et al (6). Isomap is a simple and effective dimensionality reduction method which extends the application scope of multidimensional scaling (MDS) from linear to nonlinear structure. It contains three steps: first construct the K-nearest-neighborhood (KNN) graph, then compute the graph distances (the shortest path distances in the neighborhood graph) and lastly compute the low-dimensional embedding by classical MDS. In effect, the constructed KNN graph for data points is unfolded in the low-dimensional Euclidean space, which is effective especially for preserving in the embedding the topology relationship of data points on manifolds. The crux of the success for Isomap is that it takes as the input for classical MDS the graph distances, instead of the straight-line Euclidian ones, for all pairs of data points.


Improved LASSO

arXiv.org Machine Learning

We propose an improved LASSO estimation technique based on Stein-rule. We shrink classical LASSO estimator using preliminary test, shrinkage, and positive-rule shrinkage principle. Simulation results have been carried out for various configurations of correlation coefficients ($r$), size of the parameter vector ($\beta$), error variance ($\sigma^2$) and number of non-zero coefficients ($k$) in the model parameter vector. Several real data examples have been used to demonstrate the practical usefulness of the proposed estimators. Our study shows that the risk ordering given by LSE $>$ LASSO $>$ Stein-type LASSO $>$ Stein-type positive rule LASSO, remains the same uniformly in the divergence parameter $\Delta^2$ as in the traditional case.


A General Framework for Robust Testing and Confidence Regions in High-Dimensional Quantile Regression

arXiv.org Machine Learning

We propose a robust inferential procedure for assessing uncertainties of parameter estimation in high-dimensional linear models, where the dimension $p$ can grow exponentially fast with the sample size $n$. Our method combines the de-biasing technique with the composite quantile function to construct an estimator that is asymptotically normal. Hence it can be used to construct valid confidence intervals and conduct hypothesis tests. Our estimator is robust and does not require the existence of first or second moment of the noise distribution. It also preserves efficiency in the sense that the worst case efficiency loss is less than 30\% compared to the square-loss-based de-biased Lasso estimator. In many cases our estimator is close to or better than the latter, especially when the noise is heavy-tailed. Our de-biasing procedure does not require solving the $L_1$-penalized composite quantile regression. Instead, it allows for any first-stage estimator with desired convergence rate and empirical sparsity. The paper also provides new proof techniques for developing theoretical guarantees of inferential procedures with non-smooth loss functions. To establish the main results, we exploit the local curvature of the conditional expectation of composite quantile loss and apply empirical process theories to control the difference between empirical quantities and their conditional expectations. Our results are established under weaker assumptions compared to existing work on inference for high-dimensional quantile regression. Furthermore, we consider a high-dimensional simultaneous test for the regression parameters by applying the Gaussian approximation and multiplier bootstrap theories. We also study distributed learning and exploit the divide-and-conquer estimator to reduce computation complexity when the sample size is massive. Finally, we provide empirical results to verify the theory.


A Unifying Framework in Vector-valued Reproducing Kernel Hilbert Spaces for Manifold Regularization and Co-Regularized Multi-view Learning

arXiv.org Machine Learning

This paper presents a general vector-valued reproducing kernel Hilbert spaces (RKHS) framework for the problem of learning an unknown functional dependency between a structured input space and a structured output space. Our formulation encompasses both Vector-valued Manifold Regularization and Co-regularized Multi-view Learning, providing in particular a unifying framework linking these two important learning approaches. In the case of the least square loss function, we provide a closed form solution, which is obtained by solving a system of linear equations. In the case of Support Vector Machine (SVM) classification, our formulation generalizes in particular both the binary Laplacian SVM to the multi-class, multi-view settings and the multi-class Simplex Cone SVM to the semi-supervised, multi-view settings. The solution is obtained by solving a single quadratic optimization problem, as in standard SVM, via the Sequential Minimal Optimization (SMO) approach. Empirical results obtained on the task of object recognition, using several challenging datasets, demonstrate the competitiveness of our algorithms compared with other state-of-the-art methods.