University of Virginia
Transferring Decomposed Tensors for Scalable Energy Breakdown Across Regions
Batra, Nipun (University of Virginia) | Jia, Yiling (University of Virginia) | Wang, Hongning (University of Virginia) | Whitehouse, Kamin (University of Virginia)
Homes constitute roughly one-third of the total energy usage worldwide. Providing an energy breakdown – energy consumption per appliance, can help save up to 15% energy. Given the vast differences in energy consumption patterns across different regions, existing energy breakdown solutions require instrumentation and model training for each geographical region, which is prohibitively expensive and limits the scalability. In this paper, we propose a novel region independent energy breakdown model via statistical transfer learning. Our key intuition is that the heterogeneity in homes and weather across different regions most significantly impacts the energy consumption across regions; and if we can factor out such heterogeneity, we can learn region independent models or the homogeneous energy breakdown components for each individual appliance. Thus, the model learnt in one region can be transferred to another region. We evaluate our approach on two U.S. cities having distinct weather from a publicly available dataset. We find that our approach gives better energy breakdown estimates requiring the least amount of instrumented homes from the target region, when compared to the state-of-the-art.
On Convergence of Epanechnikov Mean Shift
Huang, Kejun (University of Minnesota) | Fu, Xiao (Oregon State University) | Sidiropoulos, Nicholas D. (University of Virginia)
Epanechnikov Mean Shift is a simple yet empirically very effective algorithm for clustering. It localizes the centroids of data clusters via estimating modes of the probability distribution that generates the data points, using the "optimal" Epanechnikov kernel density estimator. However, since the procedure involves non-smooth kernel density functions,the convergence behavior of Epanechnikov mean shift lacks theoretical support as of this writing---most of the existing analyses are based on smooth functions and thus cannot be applied to Epanechnikov Mean Shift. In this work, we first show that the original Epanechnikov Mean Shift may indeed terminate at a non-critical point, due to the non-smoothness nature. Based on our analysis, we propose a simple remedy to fix it. The modified Epanechnikov Mean Shift is guaranteed to terminate at a local maximum of the estimated density, which corresponds to a cluster centroid, within a inite number of iterations. We also propose a way to avoid running the Mean Shift iterates from every data point, while maintaining good clustering accuracies under non-overlapping spherical Gaussian mixture models. This further pushes Epanechnikov Mean Shift to handle very large and high-dimensional data sets. Experiments show surprisingly good performance compared to the Lloyd's K-means algorithm and the EM algorithm.
Matrix Factorisation for Scalable Energy Breakdown
Batra, Nipun (IIIT Delhi) | Wang, Hongning (University of Virginia) | Singh, Amarjeet (IIIT Delhi) | Whitehouse, Kamin (University of Virginia)
Homes constitute more than one-thirds of the total energy consumption. Producing an energy breakdown for a home has been shown to reduce household energy consumption by up to 15%, among other benefits. However, existing approaches to produce an energy breakdown require hardware to be installed in each home and are thus prohibitively expensive. In this paper, we propose a novel application of feature-based matrix factorisation that does not require any additional hard- ware installation. The basic premise of our approach is that common design and construction patterns for homes create a repeating structure in their energy data. Thus, a sparse basis can be used to represent energy data from a broad range of homes. We evaluate our approach on 516 homes from a publicly available data set and find it to be more effective than five baseline approaches that either require sensing in each home, or a very rigorous survey across a large number of homes coupled with complex modelling. We also present a deployment of our system as a live web application that can potentially provide energy breakdown to millions of homes.
Resource Constrained Structured Prediction
Bolukbasi, Tolga (Boston University) | Chang, Kai-Wei (University of Virginia) | Wang, Joseph (Boston University) | Saligrama, Venkatesh (Boston University)
We study the problem of structured prediction under test-time budget constraints. We propose a novel approach based on selectively acquiring computationally costly features during test-time in order to reduce the computational cost of pre- diction with minimal performance degradation. We formulate a novel empirical risk minimization (ERM) for policy learning. We show that policy learning can be reduced to a series of structured learning problems, resulting in efficient training using existing structured learning algorithms. This framework provides theoretical justification for several existing heuristic approaches found in literature. We evaluate our proposed adaptive system on two structured prediction tasks, optical character recognition and dependency parsing and show significant reduction in the feature costs without degrading accuracy.
Factorization Bandits for Interactive Recommendation
Wang, Huazheng (University of Virginia) | Wu, Qingyun (University of Virginia) | Wang, Hongning (University of Virginia)
We perform online interactive recommendation via a factorization-based bandit algorithm. Low-rank matrix completion is performed over an incrementally constructed user-item preference matrix, where an upper confidence bound based item selection strategy is developed to balance the exploit/explore trade-off during online learning. Observable contextual features and dependency among users (e.g., social influence) are leveraged to improve the algorithm's convergence rate and help conquer cold-start in recommendation. A high probability sublinear upper regret bound is proved for the developed algorithm, where considerable regret reduction is achieved on both user and item sides. Extensive experimentations on both simulations and large-scale real-world datasets confirmed the advantages of the proposed algorithm compared with several state-of-the-art factorization-based and bandit-based collaborative filtering methods.
Learning to Suggest Phrases
Arnold, Kenneth Charles (Harvard University) | Chang, Kai-Wei (University of Virginia) | Kalai, Adam T. (Microsoft Research)
Intelligent keyboards can support writing by suggesting content. Certain types of phrases, when offered as suggestions, may be systematically chosen more often than their frequency in a corpus of text would predict. In order to generate those types of suggestions, we collected a dataset of how human authors responded to suggestions offered to them during open-ended writing tasks. We present an offline strategy for evaluating suggestions that enables us to learn the parameters of an improved suggestion generation policy without the expense of collecting additional data under that policy. We validate the approach by simulation and on human data by demonstrating improvement in held-out suggestion acceptance rate. Our approach can be applied to other scenarios where what is typical is not necessarily what is desirable.
A Dempster-Shafer Approach for Corrupted Electrocardiograms Signals
Napoli, Nicholas J. (University of Virginia) | Barnes, Laura E. (University of Virginia)
Continuous streaming Electrocardiogram (ECG) data inthe Intensive Care Unit (ICU) is highly susceptible tonoise artifacts and signal corruption. Currently, the publicizedalgorithms for QRS detection do not account forunreliable lead information; waveform detection is typicallycontingent upon information from a single lead;and uncertainty metrics are not provided regarding thedetection accuracy. We propose a cross-correlation fusionmethod for multi-component ECG templates usingDempster-Shafer (DS) Theory. Our experiments usingclinical data were compared to benchmark nonsyntacticdetection algorithms where the detection accuracywas comparable at high signal-to-noise ratio (SNR). However, the fusion approach demonstrated asuperior increase in accuracy when the SNR degraded.Addressing these downfalls for the detection of QRScomplexes and other waveforms has potential to improvepatient risk prediction in the ICU.
MUST-CNN: A Multilayer Shift-and-Stitch Deep Convolutional Architecture for Sequence-Based Protein Structure Prediction
Lin, Zeming (University of Virginia) | Lanchantin, Jack (University of Virginia) | Qi, Yanjun (University of Virginia)
Predicting protein properties such as solvent accessibility and secondary structure from its primary amino acid sequence is an important task in bioinformatics. Recently, a few deep learning models have surpassed the traditional window based multilayer perceptron. Taking inspiration from the image classification domain we propose a deep convolutional neural network architecture, MUST-CNN, to predict protein properties. This architecture uses a novel multilayer shift-and-stitch (MUST) technique to generate fully dense per-position predictions on protein sequences. Our model is significantly simpler than the state-of-the-art, yet achieves better results. By combining MUST and the efficient convolution operation, we can consider far more parameters while retaining very fast prediction speeds. We beat the state-of-the-art performance on two large protein property prediction datasets.
Automatic Formalization of Clinical Practice Guidelines
Gerber, Matthew (University of Virginia) | Brown, Donald (University of Virginia) | Harrison, James (University of Virginia)
Current efforts aim to incorporate knowledge from clinical practice guidelines (CPGs) into computer systems using sophisticated interchange formats. Due to their complexity, such formats require expensive manual formalization work. This paper presents a preliminary study of using natural language processing (NLP) to automatically formalize CPG recommendations. We developed a CPG representation using concepts from the Systematized Nomenclature of Medicine – Clinical Terms (SNOMED–CT), and manually applied this representation to a sample of CPG recommendations that is representative of multiple medical domains and recommendation types. Using this resource, we trained and evaluated a supervised classification model that formalizes new CPG recommendations according to the SNOMED–CT representation, achieving a precision of 75% and recall of 42% (F1 = 54%). We have identified two important lines of future investigation: (1) feature engineering to address the unique linguistic properties of CPG recommendations, and (2) alternative model formulations that are more robust to processing errors. A third line of investigation – creating additional training data for the NLP model – is shown to be of little utility.