Goto

Collaborating Authors

 Miller, Clayton


Recommender systems and reinforcement learning for human-building interaction and context-aware support: A text mining-driven review of scientific literature

arXiv.org Artificial Intelligence

The indoor environment significantly impacts human health and well-being; enhancing health and reducing energy consumption in these settings is a central research focus. With the advancement of Information and Communication Technology (ICT), recommendation systems and reinforcement learning (RL) have emerged as promising approaches to induce behavioral changes to improve the indoor environment and energy efficiency of buildings. This study aims to employ text mining and Natural Language Processing (NLP) techniques to thoroughly examine the connections among these approaches in the context of human-building interaction and occupant context-aware support. The study analyzed 27,595 articles from the ScienceDirect database, revealing extensive use of recommendation systems and RL for space optimization, location recommendations, and personalized control suggestions. Furthermore, this review underscores the vast potential for expanding recommender systems and RL applications in buildings and indoor environments. Fields ripe for innovation include predictive maintenance, building-related product recommendation, and optimization of environments tailored for specific needs, such as sleep and productivity enhancements based on user feedback. The study also notes the limitations of the method in capturing subtle academic nuances. Future improvements could involve integrating and fine-tuning pre-trained language models to better interpret complex texts.


What is a Digital Twin Anyway? Deriving the Definition for the Built Environment from over 15,000 Scientific Publications

arXiv.org Artificial Intelligence

The concept of digital twins has attracted significant attention across various domains, particularly within the built environment. However, there is a sheer volume of definitions and the terminological consensus remains out of reach. The lack of a universally accepted definition leads to ambiguities in their conceptualization and implementation, and may cause miscommunication for both researchers and practitioners. We employed Natural Language Processing (NLP) techniques to systematically extract and analyze definitions of digital twins from a corpus of 15,000 full-text articles spanning diverse disciplines in the built environment. The study compares these findings with insights from an expert survey that included 52 experts. The study identifies concurrence on the components that comprise a 'Digital Twin' from a practical perspective across various domains, contrasting them with those that do not, to identify deviations. We investigate the evolution of digital twin definitions over time and across different scales, including manufacturing, building, and urban/geospatial perspectives. We extracted the main components of Digital Twins using Text Frequency Analysis and N-gram analysis. Subsequently, we identified components that appeared in the literature and conducted a Chi-square test to assess the significance of each component in different domains. Our findings indicate that definitions differ based on the field of research in which they are conceived, but with many similarities across domains. One significant generalizable differentiation is related to whether a digital twin was used for High-Performance Real-Time (HPRT) or Long-Term Decision Support (LTDS) applications. We synthesized and contrasted the most representative definitions in each domain, culminating in a novel, data-driven definition specifically tailored for each context.


Creating synthetic energy meter data using conditional diffusion and building metadata

arXiv.org Artificial Intelligence

Advances in machine learning and increased computational power have driven progress in energy-related research. However, limited access to private energy data from buildings hinders traditional regression models relying on historical data. While generative models offer a solution, previous studies have primarily focused on short-term generation periods (e.g., daily profiles) and a limited number of meters. Thus, the study proposes a conditional diffusion model for generating high-quality synthetic energy data using relevant metadata. Using a dataset comprising 1,828 power meters from various buildings and countries, this model is compared with traditional methods like Conditional Generative Adversarial Networks (CGAN) and Conditional Variational Auto-Encoders (CVAE). It explicitly handles long-term annual consumption profiles, harnessing metadata such as location, weather, building, and meter type to produce coherent synthetic data that closely resembles real-world energy consumption patterns. The results demonstrate the proposed diffusion model's superior performance, with a 36% reduction in Frechet Inception Distance (FID) score and a 13% decrease in Kullback-Leibler divergence (KL divergence) compared to the following best method. The proposed method successfully generates high-quality energy data through metadata, and its code will be open-sourced, establishing a foundation for a broader array of energy data generation models in the future.


Opening the Black Box: Towards inherently interpretable energy data imputation models using building physics insight

arXiv.org Machine Learning

Missing data are frequently observed by practitioners and researchers in the building energy modeling community. In this regard, advanced data-driven solutions, such as Deep Learning methods, are typically required to reflect the non-linear behavior of these anomalies. As an ongoing research question related to Deep Learning, a model's applicability to limited data settings can be explored by introducing prior knowledge in the network. This same strategy can also lead to more interpretable predictions, hence facilitating the field application of the approach. For that purpose, the aim of this paper is to propose the use of Physics-informed Denoising Autoencoders (PI-DAE) for missing data imputation in commercial buildings. In particular, the presented method enforces physics-inspired soft constraints to the loss function of a Denoising Autoencoder (DAE). In order to quantify the benefits of the physical component, an ablation study between different DAE configurations is conducted. First, three univariate DAEs are optimized separately on indoor air temperature, heating, and cooling data. Then, two multivariate DAEs are derived from the previous configurations. Eventually, a building thermal balance equation is coupled to the last multivariate configuration to obtain PI-DAE. Additionally, two commonly used benchmarks are employed to support the findings. It is shown how introducing physical knowledge in a multivariate Denoising Autoencoder can enhance the inherent model interpretability through the optimized physics-based coefficients. While no significant improvement is observed in terms of reconstruction error with the proposed PI-DAE, its enhanced robustness to varying rates of missing data and the valuable insights derived from the physics-based coefficients create opportunities for wider applications within building systems and the built environment.


Filling time-series gaps using image techniques: Multidimensional context autoencoder approach for building energy data imputation

arXiv.org Artificial Intelligence

Building energy prediction and management has become increasingly important in recent decades, driven by the growth of Internet of Things (IoT) devices and the availability of more energy data. However, energy data is often collected from multiple sources and can be incomplete or inconsistent, which can hinder accurate predictions and management of energy systems and limit the usefulness of the data for decision-making and research. To address this issue, past studies have focused on imputing missing gaps in energy data, including random and continuous gaps. One of the main challenges in this area is the lack of validation on a benchmark dataset with various building and meter types, making it difficult to accurately evaluate the performance of different imputation methods. Another challenge is the lack of application of state-of-the-art imputation methods for missing gaps in energy data. Contemporary image-inpainting methods, such as Partial Convolution (PConv), have been widely used in the computer vision domain and have demonstrated their effectiveness in dealing with complex missing patterns. To study whether energy data imputation can benefit from the image-based deep learning method, this study compared PConv, Convolutional neural networks (CNNs), and weekly persistence method using one of the biggest publicly available whole building energy datasets, consisting of 1479 power meters worldwide, as the benchmark. The results show that, compared to the CNN with the raw time series (1D-CNN) and the weekly persistence method, neural network models with reshaped energy data with two dimensions reduced the Mean Squared Error (MSE) by 10% to 30%. The advanced deep learning method, Partial convolution (PConv), has further reduced the MSE by 20-30% than 2D-CNN and stands out among all models.


Elastic buildings: Calibrated district-scale simulation of occupant-flexible campus operation for hybrid work optimization

arXiv.org Artificial Intelligence

Before 2020, the way occupants utilized the built environment had been changing slowly towards scenarios in which occupants have more choice and flexibility in where and how they work. The global COVID-19 pandemic accelerated this phenomenon rapidly through lockdowns and hybrid work arrangements. Many occupants and employers are considering keeping some of these flexibility-based strategies due to their benefits and cost impacts. This paper simulates various scenarios related to the operational technologies and policies of a real-world campus using a district-scale City Energy Analyst (CEA) model that is calibrated with measured energy and occupancy profiles extracted from WiFi data. These scenarios demonstrate the energy impact of ramping building operations up and down more rapidly and effectively to the flex-based work strategies that may solidify. The scenarios show a 4-12% decrease in space cooling demand due to occupant absenteeism if centralized building system operation is in place, but as high as 21-68% if occupancy-driven building controls are implemented. The paper discusses technologies and strategies that are important in this paradigm shift of operations.


ALDI++: Automatic and parameter-less discord and outlier detection for building energy load profiles

arXiv.org Artificial Intelligence

Data-driven building energy prediction is an integral part of the process for measurement and verification, building benchmarking, and building-to-grid interaction. The ASHRAE Great Energy Predictor III (GEPIII) machine learning competition used an extensive meter data set to crowdsource the most accurate machine learning workflow for whole building energy prediction. A significant component of the winning solutions was the pre-processing phase to remove anomalous training data. Contemporary pre-processing methods focus on filtering statistical threshold values or deep learning methods requiring training data and multiple hyper-parameters. A recent method named ALDI (Automated Load profile Discord Identification) managed to identify these discords using matrix profile, but the technique still requires user-defined parameters. We develop ALDI++, a method based on the previous work that bypasses user-defined parameters and takes advantage of discord similarity. We evaluate ALDI++ against a statistical threshold, variational auto-encoder, and the original ALDI as baselines in classifying discords and energy forecasting scenarios. Our results demonstrate that while the classification performance improvement over the original method is marginal, ALDI++ helps achieve the best forecasting error improving 6% over the winning's team approach with six times less computation time.


Cohort comfort models -- Using occupants' similarity to predict personal thermal preference with less data

arXiv.org Artificial Intelligence

We introduce Cohort Comfort Models, a new framework for predicting how new occupants would perceive their thermal environment. Cohort Comfort Models leverage historical data collected from a sample population, who have some underlying preference similarity, to predict thermal preference responses of new occupants. Our framework is capable of exploiting available background information such as physical characteristics and one-time on-boarding surveys (satisfaction with life scale, highly sensitive person scale, the Big Five personality traits) from the new occupant as well as physiological and environmental sensor measurements paired with thermal preference responses. We implemented our framework in two publicly available datasets containing longitudinal data from 55 people, comprising more than 6,000 individual thermal comfort surveys. We observed that, a Cohort Comfort Model that uses background information provided very little change in thermal preference prediction performance but uses none historical data. On the other hand, for half and one third of each dataset occupant population, using Cohort Comfort Models, with less historical data from target occupants, Cohort Comfort Models increased their thermal preference prediction by 8~\% and 5~\% on average, and up to 36~\% and 46~\% for some occupants, when compared to general-purpose models trained on the whole population of occupants. The framework is presented in a data and site agnostic manner, with its different components easily tailored to the data availability of the occupants and the buildings. Cohort Comfort Models can be an important step towards personalization without the need of developing a personalized model for each new occupant.


Limitations of machine learning for building energy prediction: ASHRAE Great Energy Predictor III Kaggle competition error analysis

arXiv.org Artificial Intelligence

Research is needed to explore the limitations and potential for improvement of machine learning for building energy prediction. With this aim, the ASHRAE Great Energy Predictor III (GEPIII) Kaggle competition was launched in 2019. This effort was the largest building energy meter machine learning competition of its kind, with 4,370 participants who submitted 39,403 predictions. The test data set included two years of hourly whole building readings from 2,380 meters in 1,448 buildings at 16 locations. This paper analyzes the various sources and types of residual model error from an aggregation of the competition's top 50 solutions. This analysis reveals the limitations for machine learning using the standard model inputs of historical meter, weather, and basic building metadata. The errors are classified according to timeframe, behavior, magnitude, and incidence in single buildings or across a campus. The results show machine learning models have errors within a range of acceptability (RMSLE_scaled =< 0.1) on 79.1% of the test data. Lower magnitude (in-range) model errors (0.1 < RMSLE_scaled =< 0.3) occur in 16.1% of the test data. These errors could be remedied using innovative training data from onsite and web-based sources. Higher magnitude (out-of-range) errors (RMSLE_scaled > 0.3) occur in 4.8% of the test data and are unlikely to be accurately predicted.


Gradient boosting machines and careful pre-processing work best: ASHRAE Great Energy Predictor III lessons learned

arXiv.org Artificial Intelligence

The ASHRAE Great Energy Predictor III (GEPIII) competition was held in late 2019 as one of the largest machine learning competitions ever held focused on building performance. It was hosted on the Kaggle platform and resulted in 39,402 prediction submissions, with the top five teams splitting $25,000 in prize money. This paper outlines lessons learned from participants, mainly from teams who scored in the top 5% of the competition. Various insights were gained from their experience through an online survey, analysis of publicly shared submissions and notebooks, and the documentation of the winning teams. The top-performing solutions mostly used ensembles of Gradient Boosting Machine (GBM) tree-based models, with the LightGBM package being the most popular. The survey participants indicated that the preprocessing and feature extraction phases were the most important aspects of creating the best modeling approach. All the survey respondents used Python as their primary modeling tool, and it was common to use Jupyter-style Notebooks as development environments. These conclusions are essential to help steer the research and practical implementation of building energy meter prediction in the future.