A Compressive Classification Framework for High-Dimensional Data

May-9-2020–arXiv.org Machine Learning

We propose a compressive classification framework for settings where the data dimensionality is significantly higher than the sample size. The proposed method, referred to as compressive regularized discriminant analysis (CRDA) is based on linear discriminant analysis and has the ability to select significant features by using joint-sparsity promoting hard thresholding in the discriminant rule. Since the number of features is larger than the sample size, the method also uses state-of-the-art regularized sample covariance matrix estimators. Several analysis examples on real data sets, including image, speech signal and gene expression data illustrate the promising improvements offered by the proposed CRDA classifier in practise. Overall, the proposed method gives fewer misclassification errors than its competitors, while at the same time achieving accurate feature selection results. The open-source R package and MA TLAB toolbox of the proposed method (named compressiveRDA) is freely available. High-dimensional (HD) classification is at the core of numerous contemporary statistical studies. An increasingly common occurrence is the collection of large amounts of information on each individual sample point, even though the number of sample points themselves may remain relatively small. Typical examples are gene expression and protein mass spectrometry data, and other areas of computational biology. Regularization and shrinkage are commonly used tools in many applications such as regression or classification to overcome significant statistical challenges posed particularly due to the huge-dimension, low-sample-size (HDLSS) data settings in which the number of features, p, is often several magnitudes larger than the sample size, n (i.e., p null n).

artificial intelligence, estimator, machine learning, (15 more...)

arXiv.org Machine Learning

May-9-2020

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Pennsylvania (0.04)
  - New Jersey > Mercer County
    - Princeton (0.04)
- Europe
  - Greece (0.04)
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
  - Finland
    - Northern Ostrobothnia > Oulu (0.04)
    - Central Finland > Jyväskylä (0.04)
- Asia
  - Pakistan > Punjab
    - Lahore Division > Lahore (0.04)
  - Middle East > Saudi Arabia
    - Riyadh Province > Riyadh (0.04)

Genre:
- Research Report > Experimental Study (1.00)

Industry:
- Health & Medicine
  - Therapeutic Area > Oncology (1.00)
  - Pharmaceuticals & Biotechnology (1.00)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Statistical Learning (1.00)
  - Performance Analysis > Accuracy (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found