high-dimensional asymptotic view
Theoretical Insights Into Multiclass Classification: A High-dimensional Asymptotic View
Contemporary machine learning applications often involve classification tasks with many classes. Despite their extensive use, a precise understanding of the statistical properties and behavior of classification algorithms is still missing, especially in modern regimes where the number of classes is rather large. In this paper, we take a step in this direction by providing the first asymptotically precise analysis of linear multiclass classification. Our theoretical analysis allows us to precisely characterize how the test error varies over different training algorithms, data distributions, problem dimensions as well as number of classes, inter/intra class correlations and class priors. Specifically, our analysis reveals that the classification accuracy is highly distribution-dependent with different algorithms achieving optimal performance for different data distributions and/or training/features sizes. Unlike linear regression/binary classification, the test error in multiclass classification relies on intricate functions of the trained model (e.g., correlation between some of the trained weights) whose asymptotic behavior is difficult to characterize. This challenge is already present in simple classifiers, such as those minimizing a square loss. Our novel theoretical techniques allow us to overcome some of these challenges. The insights gained may pave the way for a precise understanding of other classification algorithms beyond those studied in this paper.
Review for NeurIPS paper: Theoretical Insights Into Multiclass Classification: A High-dimensional Asymptotic View
The authors characterize the asymptotic behaviour of four linear classifiers applied to data generated according to two models. The four classifiers differ according to their loss function: least-squares, class averaging, weighted least-squares and cross-entropy. The data are obtained through a Gaussian mixture or a multinomial logit model. The main results are convergences in probability of the parameters (intercepts and "correlation" matrices). The total and class-wise accuracies are also characterized. Experimental results (obtained on artificial data following the aforementioned models) are also provided in Section 5.
Review for NeurIPS paper: Theoretical Insights Into Multiclass Classification: A High-dimensional Asymptotic View
The paper studies the statistical behaviour of certain multiclass classification algorithms in the doubly-asymptotic limit of n, d - . The results elucidate certain differences compared to the analysis of binary classifiers, such as dependence on class-correlation matrices. One reviewer raised concerns about the results not providing insight into generalisation performance. The response indicates this is not the case, and this was corroborated by other reviews and my own reading. One critique raised by a couple of reviewers was regarding the specialised nature of the results, which are for linear classifiers and specific data models.
Theoretical Insights Into Multiclass Classification: A High-dimensional Asymptotic View
Contemporary machine learning applications often involve classification tasks with many classes. Despite their extensive use, a precise understanding of the statistical properties and behavior of classification algorithms is still missing, especially in modern regimes where the number of classes is rather large. In this paper, we take a step in this direction by providing the first asymptotically precise analysis of linear multiclass classification. Our theoretical analysis allows us to precisely characterize how the test error varies over different training algorithms, data distributions, problem dimensions as well as number of classes, inter/intra class correlations and class priors. Specifically, our analysis reveals that the classification accuracy is highly distribution-dependent with different algorithms achieving optimal performance for different data distributions and/or training/features sizes. Unlike linear regression/binary classification, the test error in multiclass classification relies on intricate functions of the trained model (e.g., correlation between some of the trained weights) whose asymptotic behavior is difficult to characterize.