hdlss
Multi-view mid fusion: a universal approach for learning in an HDLSS setting
The high-dimensional low-sample-size (HDLSS) setting presents significant challenges in various applications where the feature dimension far exceeds the number of available samples. This paper introduces a universal approach for learning in HDLSS settings using multi-view mid fusion techniques. It shows how existing mid fusion multi-view methods perform well in an HDLSS setting even if no inherent views are provided. Three view construction methods are proposed that split the high-dimensional feature vectors into smaller subsets, each representing a different view. Extensive experimental validation across model-types and learning tasks confirm the effectiveness and generalization of the approach. We believe the work in this paper lays the foundation for further research into the universal benefits of multi-view mid fusion learning.
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Belgium (0.04)
- Asia > Singapore (0.04)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.90)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)
Robust Classification of High-Dimensional Data using Data-Adaptive Energy Distance
Choudhury, Jyotishka Ray, Saha, Aytijhya, Roy, Sarbojit, Dutta, Subhajit
Classification of high-dimensional low sample size (HDLSS) data poses a challenge in a variety of real-world situations, such as gene expression studies, cancer research, and medical imaging. This article presents the development and analysis of some classifiers that are specifically designed for HDLSS data. These classifiers are free of tuning parameters and are robust, in the sense that they are devoid of any moment conditions of the underlying data distributions. It is shown that they yield perfect classification in the HDLSS asymptotic regime, under some fairly general conditions. The comparative performance of the proposed classifiers is also investigated. Our theoretical results are supported by extensive simulation studies and real data analysis, which demonstrate promising advantages of the proposed classification techniques over several widely recognized methods.
- Europe > Austria > Vienna (0.14)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Asia > Middle East > Saudi Arabia (0.04)
- (2 more...)
The classification for High-dimension low-sample size data
Huge amount of applications in various fields, such as gene expression analysis or computer vision, undergo data sets with high-dimensional low-sample-size (HDLSS), which has putted forward great challenges for standard statistical and modern machine learning methods. In this paper, we propose a novel classification criterion on HDLSS, tolerance similarity, which emphasizes the maximization of within-class variance on the premise of class separability. According to this criterion, a novel linear binary classifier is designed, denoted by No-separated Data Maximum Dispersion classifier (NPDMD). The objective of NPDMD is to find a projecting direction w in which all of training samples scatter in as large an interval as possible. NPDMD has several characteristics compared to the state-of-the-art classification methods. First, it works well on HDLSS. Second, it combines the sample statistical information and local structural information (supporting vectors) into the objective function to find the solution of projecting direction in the whole feature spaces. Third, it solves the inverse of high dimensional matrix in low dimensional space. Fourth, it is relatively simple to be implemented based on Quadratic Programming. Fifth, it is robust to the model specification for various real applications. The theoretical properties of NPDMD are deduced. We conduct a series of evaluations on one simulated and six real-world benchmark data sets, including face classification and mRNA classification. NPDMD outperforms those widely used approaches in most cases, or at least obtains comparable results.
- Asia > China > Liaoning Province > Dalian (0.04)
- North America > United States > North Carolina (0.04)
- North America > United States > New York > Broome County > Binghamton (0.04)
- (3 more...)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.89)
Population-Guided Large Margin Classifier for High-Dimension Low -Sample-Size Problems
Yin, Qingbo, Adeli, Ehsan, Shen, Liran, Shen, Dinggang
Various applications in different fields, such as gene expression analysis or computer vision, suffer from data sets with high-dimensional low-sample-size (HDLSS), which has posed significant challenges for standard statistical and modern machine learning methods. In this paper, we propose a novel linear binary classifier, denoted by population-guided large margin classifier (PGLMC), which is applicable to any sorts of data, including HDLSS. PGLMC is conceived with a projecting direction w given by the comprehensive consideration of local structural information of the hyperplane and the statistics of the training samples. Our proposed model has several advantages compared to those widely used approaches. First, it is not sensitive to the intercept term b. Second, it operates well with imbalanced data. Third, it is relatively simple to be implemented based on Quadratic Programming. Fourth, it is robust to the model specification for various real applications. The theoretical properties of PGLMC are proven. We conduct a series of evaluations on two simulated and six real-world benchmark data sets, including DNA classification, digit recognition, medical image analysis, and face recognition. PGLMC outperforms the state-of-the-art classification methods in most cases, or at least obtains comparable results.
- North America > United States > North Carolina > Orange County > Chapel Hill (0.14)
- Asia > China > Liaoning Province > Dalian (0.04)
- Oceania > New Zealand (0.04)
- (6 more...)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (0.88)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)
Support vector machine and its bias correction in high-dimension, low-sample-size settings
Nakayama, Yugo, Yata, Kazuyoshi, Aoshima, Makoto
In this paper, we consider asymptotic properties of the support vector machine (SVM) in high-dimension, low-sample-size (HDLSS) settings. We show that the hard-margin linear SVM holds a consistency property in which misclassification rates tend to zero as the dimension goes to infinity under certain severe conditions. We show that the SVM is very biased in HDLSS settings and its performance is affected by the bias directly. In order to overcome such difficulties, we propose a bias-corrected SVM (BC-SVM). We show that the BC-SVM gives preferable performances in HDLSS settings. We also discuss the SVMs in multiclass HDLSS settings. Finally, we check the performance of the classifiers in actual data analyses.
- Asia > Japan > Honshū > Kantō > Ibaraki Prefecture > Tsukuba (0.04)
- North America > United States > New York (0.04)