Discriminant Analysis
Sketched Gaussian Model Linear Discriminant Analysis via the Randomized Kaczmarz Method
Chi, Jocelyn T., Needell, Deanna
We harness a least squares formulation and mobilize the stochastic gradient descent framework. Therefore, we obtain a randomized classifier with performance that is very comparable to that of full data LDA while requiring access to only one row of the training data at a time. We present convergence guarantees for the sketched predictions on new data within a fixed number of iterations. These guarantees account for both the Gaussian modeling assumptions on the data and algorithmic randomness from the sketching procedure. Finally, we demonstrate performance with varying step-sizes and numbers of iterations. Our numerical experiments demonstrate that sketched LDA can offer a very viable alternative to full data LDA when the data may be too large for full data analysis.
- North America > United States > California > Los Angeles County > Los Angeles (0.04)
- North America > United States > New York (0.04)
- North America > United States > California > Yolo County > Davis (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Discriminant Analysis (0.42)
Spectrally-Corrected and Regularized Linear Discriminant Analysis for Spiked Covariance Model
Li, Hua, Luo, Wenya, Bai, Zhidong, Zhou, Huanchao, Pu, Zhangni
In this paper, we propose an improved linear discriminant analysis, called spectrally-corrected and regularized linear discriminant analysis (SCRLDA). This method integrates the design ideas of the sample spectrally-corrected covariance matrix and the regularized discriminant analysis. The SCRLDA method is specially designed for classification problems under the assumption that the covariance matrix follows a spiked model. Through the real and simulated data analysis, it is shown that our proposed classifier outperforms the classical R-LDA and can be as competitive as the KNN, SVM classifiers while requiring lower computational complexity.
- North America > United States > Wisconsin (0.04)
- North America > United States > New York (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (3 more...)
Varying Coefficient Linear Discriminant Analysis for Dynamic Data
Linear discriminant analysis (LDA) is an important classification tool in statistics and machine learning. This paper investigates the varying coefficient LDA model for dynamic data, with Bayes' discriminant direction being a function of some exposure variable to address the heterogeneity. We propose a new least-square estimation method based on the B-spline approximation. The data-driven discriminant procedure is more computationally efficient than the dynamic linear programming rule \citep{jiang2020dynamic}. We also establish the convergence rates for the corresponding estimation error bound and the excess misclassification risk. The estimation error in $L_2$ distance is optimal for the low-dimensional regime and is near optimal for the high-dimensional regime. Numerical experiments on synthetic data and real data both corroborate the superiority of our proposed classification method.
- North America > United States > New York (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
Linear Discriminant Analysis with High-dimensional Mixed Variables
Jiang, Binyan, Leng, Chenlei, Wang, Cheng, Yang, Zhongqing
Datasets containing both categorical and continuous variables are frequently encountered in many areas, and with the rapid development of modern measurement technologies, the dimensions of these variables can be very high. Despite the recent progress made in modelling high-dimensional data for continuous variables, there is a scarcity of methods that can deal with a mixed set of variables. To fill this gap, this paper develops a novel approach for classifying high-dimensional observations with mixed variables. Our framework builds on a location model, in which the distributions of the continuous variables conditional on categorical ones are assumed Gaussian. We overcome the challenge of having to split data into exponentially many cells, or combinations of the categorical variables, by kernel smoothing, and provide new perspectives for its bandwidth choice to ensure an analogue of Bochner's Lemma, which is different to the usual bias-variance tradeoff. We show that the two sets of parameters in our model can be separately estimated and provide penalized likelihood for their estimation. Results on the estimation accuracy and the misclassification rates are established, and the competitive performance of the proposed classifier is illustrated by extensive simulation and real data studies.
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.49)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Discriminant Analysis (0.41)
Linear Discriminant Analysis
Linear Discriminant Analysis is one of the commonly used supervised technique for dimensionality reduction. It is also used in classification problems and for data visualizations. Dimensionality Reduction is the transformation or projection of data from higher-dimensional space to lower-dimensional space. How is LDA different from PCA? The major distinction between LDA and PCA is that, LDA focuses on finding the axes that maximize the separation between multiple classes.
A Doubly Regularized Linear Discriminant Analysis Classifier with Automatic Parameter Selection
Zaib, Alam, Ballal, Tarig, Khattak, Shahid, Al-Naffouri, Tareq Y.
Linear discriminant analysis (LDA) based classifiers tend to falter in many practical settings where the training data size is smaller than, or comparable to, the number of features. As a remedy, different regularized LDA (RLDA) methods have been proposed. These methods may still perform poorly depending on the size and quality of the available training data. In particular, the test data deviation from the training data model, for example, due to noise contamination, can cause severe performance degradation. Moreover, these methods commit further to the Gaussian assumption (upon which LDA is established) to tune their regularization parameters, which may compromise accuracy when dealing with real data. To address these issues, we propose a doubly regularized LDA classifier that we denote as R2LDA. In the proposed R2LDA approach, the RLDA score function is converted into an inner product of two vectors. By substituting the expressions of the regularized estimators of these vectors, we obtain the R2LDA score function that involves two regularization parameters. To set the values of these parameters, we adopt three existing regularization techniques; the constrained perturbation regularization approach (COPRA), the bounded perturbation regularization (BPR) algorithm, and the generalized cross-validation (GCV) method. These methods are used to tune the regularization parameters based on linear estimation models, with the sample covariance matrix's square root being the linear operator. Results obtained from both synthetic and real data demonstrate the consistency and effectiveness of the proposed R2LDA approach, especially in scenarios involving test data contaminated with noise that is not observed during the training phase.
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Asia > Pakistan > Islamabad Capital Territory > Islamabad (0.04)
- Asia > Middle East > Saudi Arabia (0.04)
- Health & Medicine (0.68)
- Government (0.46)
Two-dimensional Bhattacharyya bound linear discriminant analysis with its applications
Guo, Yan-Ru, Bai, Yan-Qin, Li, Chun-Na, Bai, Lan, Shao, Yuan-Hai
Recently proposed L2-norm linear discriminant analysis criterion via the Bhattacharyya error bound estimation (L2BLDA) is an effective improvement of linear discriminant analysis (LDA) for feature extraction. However, L2BLDA is only proposed to cope with vector input samples. When facing with two-dimensional (2D) inputs, such as images, it will lose some useful information, since it does not consider intrinsic structure of images. In this paper, we extend L2BLDA to a two-dimensional Bhattacharyya bound linear discriminant analysis (2DBLDA). 2DBLDA maximizes the matrix-based between-class distance which is measured by the weighted pairwise distances of class means and meanwhile minimizes the matrix-based within-class distance. The weighting constant between the between-class and within-class terms is determined by the involved data that makes the proposed 2DBLDA adaptive. In addition, the criterion of 2DBLDA is equivalent to optimizing an upper bound of the Bhattacharyya error. The construction of 2DBLDA makes it avoid the small sample size problem while also possess robustness, and can be solved through a simple standard eigenvalue decomposition problem. The experimental results on image recognition and face image reconstruction demonstrate the effectiveness of the proposed methods.
- Asia > Mongolia (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- North America > United States > New York (0.04)
- (2 more...)
Capped norm linear discriminant analysis and its applications
Liu, Jiakou, Xiong, Xiong, Ren, Pei-Wei, Zhao, Da, Li, Chun-Na, Shao, Yuan-Hai
Classical linear discriminant analysis (LDA) is based on squared Frobenious norm and hence is sensitive to outliers and noise. To improve the robustness of LDA, in this paper, we introduce capped l_{2,1}-norm of a matrix, which employs non-squared l_2-norm and "capped" operation, and further propose a novel capped l_{2,1}-norm linear discriminant analysis, called CLDA. Due to the use of capped l_{2,1}-norm, CLDA can effectively remove extreme outliers and suppress the effect of noise data. In fact, CLDA can be also viewed as a weighted LDA. CLDA is solved through a series of generalized eigenvalue problems with theoretical convergency. The experimental results on an artificial data set, some UCI data sets and two image data sets demonstrate the effectiveness of CLDA.
- Asia > China > Tianjin Province > Tianjin (0.04)
- North America > United States > New York (0.04)
Linear discriminant initialization for feed-forward neural networks
Informed by the basic geometry underlying feed forward neural networks, we initialize the weights of the first layer of a neural network using the linear discriminants which best distinguish individual classes. Networks initialized in this way take fewer training steps to reach the same level of training, and asymptotically have higher accuracy on training data.
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Discriminant Analysis (0.64)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Variance Linear Discriminant Analysis for IRIS Biometrics
Cha, Sung-Hyuk (Pace University ) | Cha, Teryn (Essex County College)
Dichotomy transformation in biometric authentication problem creates a two class (""within"" or ""between"") classification problem in multivariate distance space. Linear discriminant analysis, which is a linear classifier, results in good performance in IRIS biometric authentication problem. However, it assumes that the distributions of two classes are normal, whereas they are closely related to the log-normal distributions. Here a modified variance linear discriminant analysis algorithm is proposed and its superior experimental results on the IRIS biometric database are reported.