Goto

Collaborating Authors

 hdlss



Multi-view mid fusion: a universal approach for learning in an HDLSS setting

Houthuys, Lynn

arXiv.org Artificial Intelligence

The high-dimensional low-sample-size (HDLSS) setting presents significant challenges in various applications where the feature dimension far exceeds the number of available samples. This paper introduces a universal approach for learning in HDLSS settings using multi-view mid fusion techniques. It shows how existing mid fusion multi-view methods perform well in an HDLSS setting even if no inherent views are provided. Three view construction methods are proposed that split the high-dimensional feature vectors into smaller subsets, each representing a different view. Extensive experimental validation across model-types and learning tasks confirm the effectiveness and generalization of the approach. We believe the work in this paper lays the foundation for further research into the universal benefits of multi-view mid fusion learning.


Robust Classification of High-Dimensional Data using Data-Adaptive Energy Distance

Choudhury, Jyotishka Ray, Saha, Aytijhya, Roy, Sarbojit, Dutta, Subhajit

arXiv.org Machine Learning

Classification of high-dimensional low sample size (HDLSS) data poses a challenge in a variety of real-world situations, such as gene expression studies, cancer research, and medical imaging. This article presents the development and analysis of some classifiers that are specifically designed for HDLSS data. These classifiers are free of tuning parameters and are robust, in the sense that they are devoid of any moment conditions of the underlying data distributions. It is shown that they yield perfect classification in the HDLSS asymptotic regime, under some fairly general conditions. The comparative performance of the proposed classifiers is also investigated. Our theoretical results are supported by extensive simulation studies and real data analysis, which demonstrate promising advantages of the proposed classification techniques over several widely recognized methods.


The classification for High-dimension low-sample size data

Shen, Liran, Yin, Qingbo

arXiv.org Machine Learning

Huge amount of applications in various fields, such as gene expression analysis or computer vision, undergo data sets with high-dimensional low-sample-size (HDLSS), which has putted forward great challenges for standard statistical and modern machine learning methods. In this paper, we propose a novel classification criterion on HDLSS, tolerance similarity, which emphasizes the maximization of within-class variance on the premise of class separability. According to this criterion, a novel linear binary classifier is designed, denoted by No-separated Data Maximum Dispersion classifier (NPDMD). The objective of NPDMD is to find a projecting direction w in which all of training samples scatter in as large an interval as possible. NPDMD has several characteristics compared to the state-of-the-art classification methods. First, it works well on HDLSS. Second, it combines the sample statistical information and local structural information (supporting vectors) into the objective function to find the solution of projecting direction in the whole feature spaces. Third, it solves the inverse of high dimensional matrix in low dimensional space. Fourth, it is relatively simple to be implemented based on Quadratic Programming. Fifth, it is robust to the model specification for various real applications. The theoretical properties of NPDMD are deduced. We conduct a series of evaluations on one simulated and six real-world benchmark data sets, including face classification and mRNA classification. NPDMD outperforms those widely used approaches in most cases, or at least obtains comparable results.


Population-Guided Large Margin Classifier for High-Dimension Low -Sample-Size Problems

Yin, Qingbo, Adeli, Ehsan, Shen, Liran, Shen, Dinggang

arXiv.org Machine Learning

Various applications in different fields, such as gene expression analysis or computer vision, suffer from data sets with high-dimensional low-sample-size (HDLSS), which has posed significant challenges for standard statistical and modern machine learning methods. In this paper, we propose a novel linear binary classifier, denoted by population-guided large margin classifier (PGLMC), which is applicable to any sorts of data, including HDLSS. PGLMC is conceived with a projecting direction w given by the comprehensive consideration of local structural information of the hyperplane and the statistics of the training samples. Our proposed model has several advantages compared to those widely used approaches. First, it is not sensitive to the intercept term b. Second, it operates well with imbalanced data. Third, it is relatively simple to be implemented based on Quadratic Programming. Fourth, it is robust to the model specification for various real applications. The theoretical properties of PGLMC are proven. We conduct a series of evaluations on two simulated and six real-world benchmark data sets, including DNA classification, digit recognition, medical image analysis, and face recognition. PGLMC outperforms the state-of-the-art classification methods in most cases, or at least obtains comparable results.


Support vector machine and its bias correction in high-dimension, low-sample-size settings

Nakayama, Yugo, Yata, Kazuyoshi, Aoshima, Makoto

arXiv.org Machine Learning

In this paper, we consider asymptotic properties of the support vector machine (SVM) in high-dimension, low-sample-size (HDLSS) settings. We show that the hard-margin linear SVM holds a consistency property in which misclassification rates tend to zero as the dimension goes to infinity under certain severe conditions. We show that the SVM is very biased in HDLSS settings and its performance is affected by the bias directly. In order to overcome such difficulties, we propose a bias-corrected SVM (BC-SVM). We show that the BC-SVM gives preferable performances in HDLSS settings. We also discuss the SVMs in multiclass HDLSS settings. Finally, we check the performance of the classifiers in actual data analyses.