margin distribution
A Refined Margin Distribution Analysis for Forest Representation Learning
In this paper, we formulate the forest representation learning approach called \textsc{CasDF} as an additive model which boosts the augmented feature instead of the prediction. We substantially improve the upper bound of the generalization gap from $\mathcal{O}(\sqrt{\ln m/m})$ to $\mathcal{O}(\ln m/m)$, while the margin ratio of the margin standard deviation to the margin mean is sufficiently small. This tighter upper bound inspires us to optimize the ratio. Therefore, we design a margin distribution reweighting approach for deep forest to achieve a small margin ratio by boosting the augmented feature. Experiments confirm the correlation between the margin distribution and generalization performance. We remark that this study offers a novel understanding of \textsc{CasDF} from the perspective of the margin theory and further guides the layer-by-layer forest representation learning.
- Oceania > Australia > Queensland (0.04)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- (2 more...)
1ea97de85eb634d580161c603422437f-Supplemental.pdf
Supplementary material: Hold me tight! A Theoretical margin distribution of a linear classifier 2 B Examples of frequency "flipped" images 4 C Invariance and elasticity on MNIST data 4 D Connections to catastrophic forgetting 5 E Examples of filtered images 6 F Subspace sampling of the DCT 6 G Training parameters 7 H Cross-dataset performance 8 I Margin distribution for standard networks 9 J Adversarial training parameters 13 K Description of L2-PGD attack on frequency "flipped" data 14 L Spectral decomposition on frequency "flipped" data 15 M Margin distribution for adversarially trained networks 16 N Margin distribution on random subspaces 19 We demonstrate this effect in practice by repeating the experiment of Sec. MLP we use a simple logistic regression (see Table S1).Clearly, although the values along Figure S1 shows a few example images of the frequency "flipped" versions of the standard computer We further validate our observation of Section 3.2.2 that small margin do indeed corresponds to After this, we continue training the network with a linearly decaying learning rate (max. Figure S4: Filtered image examples. Table S2 shows the performance and training parameters of the different networks used in the paper.
- Asia > China > Jiangsu Province > Nanjing (0.04)
- North America > Canada (0.04)
- North America > Canada (0.04)
- Europe > Denmark (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > Denmark (0.04)
1ea97de85eb634d580161c603422437f-Supplemental.pdf
Supplementary material: Hold me tight! A Theoretical margin distribution of a linear classifier 2 B Examples of frequency "flipped" images 4 C Invariance and elasticity on MNIST data 4 D Connections to catastrophic forgetting 5 E Examples of filtered images 6 F Subspace sampling of the DCT 6 G Training parameters 7 H Cross-dataset performance 8 I Margin distribution for standard networks 9 J Adversarial training parameters 13 K Description of L2-PGD attack on frequency "flipped" data 14 L Spectral decomposition on frequency "flipped" data 15 M Margin distribution for adversarially trained networks 16 N Margin distribution on random subspaces 19 We demonstrate this effect in practice by repeating the experiment of Sec. MLP we use a simple logistic regression (see Table S1).Clearly, although the values along Figure S1 shows a few example images of the frequency "flipped" versions of the standard computer We further validate our observation of Section 3.2.2 that small margin do indeed corresponds to After this, we continue training the network with a linearly decaying learning rate (max. Figure S4: Filtered image examples. Table S2 shows the performance and training parameters of the different networks used in the paper.
- North America > Canada (0.04)
- Europe > Denmark (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > Denmark (0.04)
Structure-Preserving Margin Distribution Learning for High-Order Tensor Data with Low-Rank Decomposition
Xu, Yang, Li, Junpeng, Hua, Changchun, Yang, Yana
Abstract--The Large Margin Distribution Machine (LMDM) is a recent advancement in classifier design that optimizes not just the minimum margin (as in SVM) but the entire margin distribution, thereby improving generalization. However, existing LMDM formulations are limited to vectorized inputs and struggle with high-dimensional tensor data due to the need for flattening, which destroys the data's inherent multi-mode structure and increases computational burden. In this paper, we propose a Structure-Preserving Margin Distribution Learning for High-Order T ensor Data with Low-Rank Decomposition (SPMD-LRT) that operates directly on tensor representations without vectorization. The SPMD-LRT preserves multi-dimensional spatial structure by incorporating first-order and second-order tensor statistics (margin mean and variance) into the objective, and it leverages low-rank tensor decomposition techniques including rank-1(CP), higher-rank CP, and T ucker decomposition to parameterize the weight tensor . An alternating optimization (double-gradient descent) algorithm is developed to efficiently solve the SPMD-LRT, iteratively updating factor matrices and core tensor . This approach enables SPMD-LRT to maintain the structural information of high-order data while optimizing margin distribution for improved classification. Extensive experiments on diverse datasets (including MNIST, images and fMRI neuroimaging) demonstrate that SPMD-LRT achieves superior classification accuracy compared to conventional SVM, vector-based LMDM, and prior tensor-based SVM extensions (Support T ensor Machines and Support T ucker Machines). These results confirm the effectiveness and robustness of SPMD-LRT in handling high-dimensional tensor data for classification. Dvances in data acquisition have led to an abundance of high-order tensor data (multi-dimensional arrays) across various domains, such as video sequences, medical imaging, and spatiotemporal sensor readings. Effectively learning from such tensor-structured data has become a pressing research focus [1] [2]. The multi-dimensional structure of tensors offers rich information (e.g.
- Europe > Sweden > Uppsala County > Uppsala (0.04)
- Asia > China (0.04)
- Africa > Senegal > Kolda Region > Kolda (0.04)