Industrial Energy Disaggregation with Digital Twin-generated Dataset and Efficient Data Augmentation

Internò, Christian, Castellani, Andrea, Schmitt, Sebastian, Stella, Fabio, Hammer, Barbara

arXiv.org Artificial Intelligence 

Abstract--Industrial Non-Intrusive Load Monitoring (NILM) is limited by the scarcity of high-quality datasets and the complex variability of industrial energy consumption patterns. T o address data scarcity and privacy issues, we introduce the Synthetic Industrial Dataset for Energy Disaggregation (SIDED), an open-source dataset generated using Digital Twin simulations. SIDED includes three types of industrial facilities across three different geographic locations, capturing diverse appliance behaviors, weather conditions, and load profiles. We also propose the Appliance-Modulated Data Augmentation (AMDA) method, a computationally efficient technique that enhances NILM model generalization by intelligently scaling appliance power contributions based on their relative impact. We show in experiments that NILM models trained with AMDA-augmented data significantly improve the disaggregation of energy consumption of complex industrial appliances like combined heat and power systems. Specifically, in our out-of-sample scenarios, models trained with AMDA achieved a Normalized Disaggregation Error of 0.167, outperforming models trained without data augmentation (0.451) and those trained with state-of-the-art data augmentation methods (0.290). Data distribution analyses confirm that AMDA effectively aligns training and test data distributions, enhancing model generalization. NERGY management has become increasingly important due to the undeniable reality of climate change and the rising global energy demand [1]. The industrial sector plays a significant role in international energy optimization [2], [3], necessitating heightened awareness of energy consumption to enhance efficiency and sustainability. C. Intern ` o and B. Hammer are with the Machine Learning Group, Center for Cognitive Interaction Technology (CITEC), University of Bielefeld, Bielefeld, Germany. C. Intern ` o, A. Castellani and S. Schmitt are with the Honda Research Institute EU, Offenbach am Main, Germany. F. Stella is with the Models and Algorithms for Data and Text Mining Laboratory (MADLab), Department of Informatics, Systems and Communication (DISCo), University of Milano - Bicocca, Milan, Italy.