Goto

Collaborating Authors

 tv ae





A Experiment Details

Neural Information Processing Systems

Given the differences between the training procedures of the model presented in Section 6.2, and those All models in Section 6.3 were trained with stochastic gradient descent on batches of size All models presented in this paper make use of the same 3-Layer MLP for parameterizing the encoders and decoders. This is then divided into 18 capsules, each of 18 dimensions. The decoder layers then have output sizes (450, 675, 4096). For all topographic models (TV AE and BubbleV AE) in Section 6.3, the global topographic organization afforded by These values were chosen to be sufficiently large to achieve notably lower equivariance error than the V AE baseline, and thus demonstrate the impact of topographic organization without temporal coherence. The results of all models are shown in Section B below.


A Comparative Study of Open-Source Libraries for Synthetic Tabular Data Generation: SDV vs. SynthCity

Del Gobbo, Cristian

arXiv.org Artificial Intelligence

High-quality training data is critical to the performance of machine learning models, particularly Large Language Models (LLMs). However, obtaining real, high-quality data can be challenging, especially for smaller organizations and early-stage startups. Synthetic data generators provide a promising solution by replicating the statistical and structural properties of real data while preserving privacy and scalability. This study evaluates the performance of six tabular synthetic data generators from two widely used open-source libraries: SDV (Gaussian Copula, CTGAN, TVAE) and Synthicity (Bayesian Network, CTGAN, TVAE). Using a real-world dataset from the UCI Machine Learning Repository, comprising energy consumption and environmental variables from Belgium, we simulate a low-data regime by training models on only 1,000 rows. Each generator is then tasked with producing synthetic datasets under two conditions: a 1:1 (1,000 rows) and a 1:10 (10,000 rows) input-output ratio. Evaluation is conducted using two criteria: statistical similarity, measured via classical statistics and distributional metrics; and predictive utility, assessed using a "Train on Synthetic, Test on Real" approach with four regression models. While statistical similarity remained consistent across models in both scenarios, predictive utility declined notably in the 1:10 case. The Bayesian Network from Synthicity achieved the highest fidelity in both scenarios, while TVAE from SDV performed best in predictive tasks under the 1:10 setting. Although no significant performance gap was found between the two libraries, SDV stands out for its superior documentation and ease of use, making it more accessible for practitioners.


Enhancing Obsolescence Forecasting with Deep Generative Data Augmentation: A Semi-Supervised Framework for Low-Data Industrial Applications

Saad, Elie, Besbes, Mariem, Zolghadri, Marc, Czmil, Victor, Baron, Claude, Bourgeois, Vincent

arXiv.org Artificial Intelligence

The challenge of electronic component obsolescence is particularly critical in systems with long life cycles. Various obsolescence management methods are employed to mitigate its impact, with obsolescence forecasting being a highly sought-after and prominent approach. As a result, numerous machine learning-based forecasting methods have been proposed. However, machine learning models require a substantial amount of relevant data to achieve high precision, which is lacking in the current obsolescence landscape in some situations. This work introduces a novel framework for obsolescence forecasting based on deep learning. The proposed framework solves the lack of available data through deep generative modeling, where new obsolescence cases are generated and used to augment the training dataset. The augmented dataset is then used to train a classical machine learning-based obsolescence forecasting model. To train classical forecasting models using augmented datasets, existing classical supervised-learning classifiers are adapted for semi-supervised learning within this framework. The proposed framework demonstrates state-of-the-art results on benchmarking datasets.