Goto

Collaborating Authors

 Heng Huang


Regularized Modal Regression with Applications in Cognitive Impairment Prediction

Neural Information Processing Systems

Linear regression models have been successfully used to function estimation and model selection in high-dimensional data analysis. However, most existing methods are built on least squares with the mean square error (MSE) criterion, which are sensitive to outliers and their performance may be degraded for heavy-tailed noise. In this paper, we go beyond this criterion by investigating the regularized modal regression from a statistical learning viewpoint. A new regularized modal regression model is proposed for estimation and variable selection, which is robust to outliers, heavy-tailed noise, and skewed noise. On the theoretical side, we establish the approximation estimate for learning the conditional mode function, the sparsity analysis for variable selection, and the robustness characterization. On the application side, we applied our model to successfully improve the cognitive impairment prediction using the Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort data.


Group Sparse Additive Machine

Neural Information Processing Systems

A family of learning algorithms generated from additive models have attracted much attention recently for their flexibility and interpretability in high dimensional data analysis. Among them, learning models with grouped variables have shown competitive performance for prediction and variable selection. However, the previous works mainly focus on the least squares regression problem, not the classification task. Thus, it is desired to design the new additive classification model with variable selection capability for many real-world applications which focus on high-dimensional data classification. To address this challenging problem, in this paper, we investigate the classification with group sparse additive models in reproducing kernel Hilbert spaces. A novel classification method, called as group sparse additive machine (GroupSAM), is proposed to explore and utilize the structure information among the input variables. Generalization error bound is derived and proved by integrating the sample error analysis with empirical covering numbers and the hypothesis error estimate with the stepping stone technique. Our new bound shows that GroupSAM can achieve a satisfactory learning rate with polynomial decay. Experimental results on synthetic data and seven benchmark datasets consistently show the effectiveness of our new approach.


Training Neural Networks Using Features Replay

Neural Information Processing Systems

Training a neural network using backpropagation algorithm requires passing error gradients sequentially through the network. The backward locking prevents us from updating network layers in parallel and fully leveraging the computing resources. Recently, there are several works trying to decouple and parallelize the backpropagation algorithm. However, all of them suffer from severe accuracy loss or memory explosion when the neural network is deep. To address these challenging issues, we propose a novel parallel-objective formulation for the objective function of the neural network. After that, we introduce features replay algorithm and prove that it is guaranteed to converge to critical points for the non-convex problem under certain conditions. Finally, we apply our method to training deep convolutional neural networks, and the experimental results show that the proposed method achieves faster convergence, lower memory consumption, and better generalization error than compared methods.


Bilevel Distance Metric Learning for Robust Image Recognition

Neural Information Processing Systems

Metric learning, aiming to learn a discriminative Mahalanobis distance matrix M that can effectively reflect the similarity between data samples, has been widely studied in various image recognition problems. Most of the existing metric learning methods input the features extracted directly from the original data in the preprocess phase. What's worse, these features usually take no consideration of the local geometrical structure of the data and the noise that exists in the data, thus they may not be optimal for the subsequent metric learning task. In this paper, we integrate both feature extraction and metric learning into one joint optimization framework and propose a new bilevel distance metric learning model. Specifically, the lower level characterizes the intrinsic data structure using graph regularized sparse coefficients, while the upper level forces the data samples from the same class to be close to each other and pushes those from different classes far away. In addition, leveraging the KKT conditions and the alternating direction method (ADM), we derive an efficient algorithm to solve the proposed new model. Extensive experiments on various occluded datasets demonstrate the effectiveness and robustness of our method.


Training Neural Networks Using Features Replay

Neural Information Processing Systems

Training a neural network using backpropagation algorithm requires passing error gradients sequentially through the network. The backward locking prevents us from updating network layers in parallel and fully leveraging the computing resources. Recently, there are several works trying to decouple and parallelize the backpropagation algorithm. However, all of them suffer from severe accuracy loss or memory explosion when the neural network is deep. To address these challenging issues, we propose a novel parallel-objective formulation for the objective function of the neural network. After that, we introduce features replay algorithm and prove that it is guaranteed to converge to critical points for the non-convex problem under certain conditions. Finally, we apply our method to training deep convolutional neural networks, and the experimental results show that the proposed method achieves faster convergence, lower memory consumption, and better generalization error than compared methods.



Curvilinear Distance Metric Learning

Neural Information Processing Systems

Distance Metric Learning aims to learn an appropriate metric that faithfully measures the distance between two data points. Traditional metric learning methods usually calculate the pairwise distance with fixed distance functions (e.g., Euclidean distance) in the projected feature spaces. However, they fail to learn the underlying geometries of the sample space, and thus cannot exactly predict the intrinsic distances between data points. To address this issue, we first reveal that the traditional linear distance metric is equivalent to the cumulative arc length between the data pair's nearest points on the learned straight measurer lines. After that, by extending such straight lines to general curved forms, we propose a Curvilinear Distance Metric Learning (CDML) method, which adaptively learns the nonlinear geometries of the training data. By virtue of Weierstrass theorem, the proposed CDML is equivalently parameterized with a 3-order tensor, and the optimization algorithm is designed to learn the tensor parameter. Theoretical analysis is derived to guarantee the effectiveness and soundness of CDML.


Curvilinear Distance Metric Learning

Neural Information Processing Systems

Distance Metric Learning aims to learn an appropriate metric that faithfully measures the distance between two data points. Traditional metric learning methods usually calculate the pairwise distance with fixed distance functions (e.g., Euclidean distance) in the projected feature spaces. However, they fail to learn the underlying geometries of the sample space, and thus cannot exactly predict the intrinsic distances between data points. To address this issue, we first reveal that the traditional linear distance metric is equivalent to the cumulative arc length between the data pair's nearest points on the learned straight measurer lines. After that, by extending such straight lines to general curved forms, we propose a Curvilinear Distance Metric Learning (CDML) method, which adaptively learns the nonlinear geometries of the training data. By virtue of Weierstrass theorem, the proposed CDML is equivalently parameterized with a 3-order tensor, and the optimization algorithm is designed to learn the tensor parameter. Theoretical analysis is derived to guarantee the effectiveness and soundness of CDML.


Error Analysis of Generalized Nyström Kernel Regression

Neural Information Processing Systems

Nyström method has been successfully used to improve the computational efficiency of kernel ridge regression (KRR). Recently, theoretical analysis of Nyström KRR, including generalization bound and convergence rate, has been established based on reproducing kernel Hilbert space (RKHS) associated with the symmetric positive semi-definite kernel. However, in real world applications, RKHS is not always optimal and kernel function is not necessary to be symmetric or positive semi-definite.


Regularized Modal Regression with Applications in Cognitive Impairment Prediction

Neural Information Processing Systems

Linear regression models have been successfully used to function estimation and model selection in high-dimensional data analysis. However, most existing methods are built on least squares with the mean square error (MSE) criterion, which are sensitive to outliers and their performance may be degraded for heavy-tailed noise. In this paper, we go beyond this criterion by investigating the regularized modal regression from a statistical learning viewpoint. A new regularized modal regression model is proposed for estimation and variable selection, which is robust to outliers, heavy-tailed noise, and skewed noise. On the theoretical side, we establish the approximation estimate for learning the conditional mode function, the sparsity analysis for variable selection, and the robustness characterization. On the application side, we applied our model to successfully improve the cognitive impairment prediction using the Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort data.