Data Augmentation in High Dimensional Low Sample Size Setting Using a Geometry-Based Variational Autoencoder

Chadebec, Clément, Thibeau-Sutre, Elina, Burgos, Ninon, Allassonnière, Stéphanie

Apr-30-2021–arXiv.org Machine Learning

In this paper, we propose a new method to perform data augmentation in a reliable way in the High Dimensional Low Sample Size (HDLSS) setting using a geometry-based variational autoencoder. Our approach combines a proper latent space modeling of the VAE seen as a Riemannian manifold with a new generation scheme which produces more meaningful samples especially in the context of small data sets. The proposed method is tested through a wide experimental study where its robustness to data sets, classifiers and training samples size is stressed. It is also validated on a medical imaging classification task on the challenging ADNI database where a small number of 3D brain MRIs are considered and augmented using the proposed VAE framework. In each case, the proposed method allows for a significant and reliable gain in the classification metrics. For instance, balanced accuracy jumps from 66.3% to 74.3% for a state-of-the-art CNN classifier trained with 50 MRIs of cognitively normal (CN) and 50 Alzheimer disease (AD) patients and from 77.7% to 86.3% when trained with 243 CN and 210 AD while improving greatly sensitivity and specificity metrics.

classifier, data augmentation, latent space, (16 more...)

arXiv.org Machine Learning

Apr-30-2021

arXiv.org PDF

Add feedback

Country:
- South America > Chile
  - Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America
  - Canada (0.04)
  - United States
    - California (0.14)
    - Massachusetts > Suffolk County
      - Boston (0.04)
- Europe
  - United Kingdom > England
    - Greater London > London (0.04)
    - Cambridgeshire > Cambridge (0.04)
  - France > Île-de-France
    - Paris > Paris (0.04)
- Asia > Middle East
  - Lebanon (0.04)
  - Jordan (0.04)

Genre:
- Research Report
  - New Finding (0.65)
  - Experimental Study (0.47)

Industry:
- Health & Medicine
  - Pharmaceuticals & Biotechnology (1.00)
  - Health Care Technology (1.00)
  - Diagnostic Medicine > Imaging (1.00)
  - Therapeutic Area > Neurology
    - Alzheimer's Disease (0.66)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Statistical Learning (1.00)
  - Neural Networks > Deep Learning (0.93)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found