Sliced-Wasserstein Distance-based Data Selection
Pallage, Julien, Lesage-Landry, Antoine
–arXiv.org Artificial Intelligence
We propose a new unsupervised anomaly detection method based on the sliced-Wasserstein distance for training data selection in machine learning approaches. Our filtering technique is interesting for decision-making pipelines deploying machine learning models in critical sectors, e.g., power systems, as it offers a conservative data selection and an optimal transport interpretation. To ensure the scalability of our method, we provide two efficient approximations. The first approximation processes reduced-cardinality representations of the datasets concurrently. The second makes use of a computationally light Euclidian distance approximation. Additionally, we open the first dataset showcasing localized critical peak rebate demand response in a northern climate. We present the filtering patterns of our method on synthetic datasets and numerically benchmark our method for training data selection. Finally, we employ our method as part of a first forecasting benchmark for our open-source dataset.
arXiv.org Artificial Intelligence
Apr-18-2025
- Country:
- Asia > Middle East
- Republic of Türkiye > Istanbul Province > Istanbul (0.04)
- Europe > Middle East
- Republic of Türkiye > Istanbul Province > Istanbul (0.04)
- North America
- Canada (0.04)
- United States (0.04)
- Asia > Middle East
- Genre:
- Research Report (0.82)
- Industry:
- Automobiles & Trucks (1.00)
- Energy > Power Industry (1.00)
- Transportation > Ground
- Road (0.46)
- Technology: