Hybrid Autoencoders for Tabular Data: Leveraging Model-Based Augmentation in Low-Label Settings
–Neural Information Processing Systems
Deep neural networks often underperform on tabular data due to sensitivity to irrelevant features and a spectral bias toward smooth, low-frequency functions, limiting their ability to capture sharp, high-frequency signals in low-label regimes. While self-supervised learning (SSL) holds promise in such settings, it remains challenging in tabular domains due to the limited availability of effective data augmentations. We introduce TANDEM (Tree-And-Neural Dual Encoder Model), a hybrid autoencoder that trains a neural encoder alongside an oblivious soft decision tree (OSDT) encoder, both guided by dedicated stochastic gating networks for sample-specific feature selection. The encoders share a decoder and are coupled via alignment losses, encouraging complementary yet consistent representations. The training-only use of the tree operates as model-based augmentation, nudging representations toward tabular-relevant features while preserving a lean inference path (only the neural encoder is deployed). Spectral analysis highlights distinct yet complementary inductive biases across encoders, and experiments on classification and regression benchmarks in low-label settings show consistent gains over strong deep, tree-based, and SSL baselines.
Neural Information Processing Systems
Jun-19-2026, 10:56:07 GMT