Goto

Collaborating Authors

University of Virginia


"Is This an STD? Please Help!": Online Information Seeking for Sexually Transmitted Diseases on Reddit

AAAI Conferences

Increasing incidence of sexually transmitted diseases (STDs) has prompted the public health and technology communities to innovate new measures to understand how individuals use Internet resources to attain relevant information, particularly for sensitive or stigmatized conditions. The purpose of this study is to examine recent health information seeking and needs of the r/STD community, a subreddit focused exclusively on STDs. We found that the majority of posts crowdsource information about intermediate, non-reportable STDs such as human papillomavirus (HPV). Crowdsourced information in this community focused on symptoms, treatment, as well as the social and emotional aspects of sexual health such as fear of misdiagnosis. From our analysis, it is clear that online communities focused on discussion of health symptoms have the ripe potential to influence information-seeking behavior and consumer action.


Policy Issues Regarding Implementations of Cyber Attack Resilience Solutions for Cyber Physical Systems

AAAI Conferences

The Internet of Things (IoT) is dramatically increasing complexity in cities, commerce and homes. This complexity is increasing the risk to cyber threats. To reduce these risks, resilient cyberphysical systems must be able to respond to different types of disturbances (errors; cyberattacks). Organizational, system and infrastructure security pose new challenges for policy considerations that reduce cyber risks rather than simply reacting to cyberattacks. Indeed, policies must be crafted to require anticipatory responses able to discriminate between anomalies caused by errors and those driven by cyberattackers for malicious purposes that may result in obvious damage (e.g., equipment destruction, injury or death) or subtle control (e.g., Stuxnet). We conclude that anticipatory resilience solutions for cyberphysical systems will require teams of government and commercial organizations to address the consequences of cyberattacks, to detect them and to defend against them.


On Convergence of Epanechnikov Mean Shift

AAAI Conferences

Epanechnikov Mean Shift is a simple yet empirically very effective algorithm for clustering. It localizes the centroids of data clusters via estimating modes of the probability distribution that generates the data points, using the "optimal" Epanechnikov kernel density estimator. However, since the procedure involves non-smooth kernel density functions,the convergence behavior of Epanechnikov mean shift lacks theoretical support as of this writing---most of the existing analyses are based on smooth functions and thus cannot be applied to Epanechnikov Mean Shift. In this work, we first show that the original Epanechnikov Mean Shift may indeed terminate at a non-critical point, due to the non-smoothness nature. Based on our analysis, we propose a simple remedy to fix it. The modified Epanechnikov Mean Shift is guaranteed to terminate at a local maximum of the estimated density, which corresponds to a cluster centroid, within a inite number of iterations. We also propose a way to avoid running the Mean Shift iterates from every data point, while maintaining good clustering accuracies under non-overlapping spherical Gaussian mixture models. This further pushes Epanechnikov Mean Shift to handle very large and high-dimensional data sets. Experiments show surprisingly good performance compared to the Lloyd's K-means algorithm and the EM algorithm.


Batra

AAAI Conferences

Homes constitute roughly one-third of the total energy usage worldwide. Providing an energy breakdown – energy consumption per appliance, can help save up to 15% energy. Given the vast differences in energy consumption patterns across different regions, existing energy breakdown solutions require instrumentation and model training for each geographical region, which is prohibitively expensive and limits the scalability. In this paper, we propose a novel region independent energy breakdown model via statistical transfer learning. Our key intuition is that the heterogeneity in homes and weather across different regions most significantly impacts the energy consumption across regions; and if we can factor out such heterogeneity, we can learn region independent models or the homogeneous energy breakdown components for each individual appliance. Thus, the model learnt in one region can be transferred to another region. We evaluate our approach on two U.S. cities having distinct weather from a publicly available dataset. We find that our approach gives better energy breakdown estimates requiring the least amount of instrumented homes from the target region, when compared to the state-of-the-art.


Huang

AAAI Conferences

Epanechnikov Mean Shift is a simple yet empirically very effective algorithm for clustering. It localizes the centroids of data clusters via estimating modes of the probability distribution that generates the data points, using the "optimal" Epanechnikov kernel density estimator. However, since the procedure involves non-smooth kernel density functions,the convergence behavior of Epanechnikov mean shift lacks theoretical support as of this writing---most of the existing analyses are based on smooth functions and thus cannot be applied to Epanechnikov Mean Shift. In this work, we first show that the original Epanechnikov Mean Shift may indeed terminate at a non-critical point, due to the non-smoothness nature. Based on our analysis, we propose a simple remedy to fix it. The modified Epanechnikov Mean Shift is guaranteed to terminate at a local maximum of the estimated density, which corresponds to a cluster centroid, within a inite number of iterations. We also propose a way to avoid running the Mean Shift iterates from every data point, while maintaining good clustering accuracies under non-overlapping spherical Gaussian mixture models. This further pushes Epanechnikov Mean Shift to handle very large and high-dimensional data sets. Experiments show surprisingly good performance compared to the Lloyd's K-means algorithm and the EM algorithm.


Transferring Decomposed Tensors for Scalable Energy Breakdown Across Regions

AAAI Conferences

Homes constitute roughly one-third of the total energy usage worldwide. Providing an energy breakdown – energy consumption per appliance, can help save up to 15% energy. Given the vast differences in energy consumption patterns across different regions, existing energy breakdown solutions require instrumentation and model training for each geographical region, which is prohibitively expensive and limits the scalability. In this paper, we propose a novel region independent energy breakdown model via statistical transfer learning. Our key intuition is that the heterogeneity in homes and weather across different regions most significantly impacts the energy consumption across regions; and if we can factor out such heterogeneity, we can learn region independent models or the homogeneous energy breakdown components for each individual appliance. Thus, the model learnt in one region can be transferred to another region. We evaluate our approach on two U.S. cities having distinct weather from a publicly available dataset. We find that our approach gives better energy breakdown estimates requiring the least amount of instrumented homes from the target region, when compared to the state-of-the-art.


Bolukbasi

AAAI Conferences

We study the problem of structured prediction under test-time budget constraints. We propose a novel approach based on selectively acquiring computationally costly features during test-time in order to reduce the computational cost of pre- diction with minimal performance degradation. We formulate a novel empirical risk minimization (ERM) for policy learning. We show that policy learning can be reduced to a series of structured learning problems, resulting in efficient training using existing structured learning algorithms. This framework provides theoretical justification for several existing heuristic approaches found in literature. We evaluate our proposed adaptive system on two structured prediction tasks, optical character recognition and dependency parsing and show significant reduction in the feature costs without degrading accuracy.


Wang

AAAI Conferences

We perform online interactive recommendation via a factorization-based bandit algorithm. Low-rank matrix completion is performed over an incrementally constructed user-item preference matrix, where an upper confidence bound based item selection strategy is developed to balance the exploit/explore trade-off during online learning. Observable contextual features and dependency among users (e.g., social influence) are leveraged to improve the algorithm's convergence rate and help conquer cold-start in recommendation. A high probability sublinear upper regret bound is proved for the developed algorithm, where considerable regret reduction is achieved on both user and item sides. Extensive experimentations on both simulations and large-scale real-world datasets confirmed the advantages of the proposed algorithm compared with several state-of-the-art factorization-based and bandit-based collaborative filtering methods.


Matrix Factorisation for Scalable Energy Breakdown

AAAI Conferences

Homes constitute more than one-thirds of the total energy consumption. Producing an energy breakdown for a home has been shown to reduce household energy consumption by up to 15%, among other benefits. However, existing approaches to produce an energy breakdown require hardware to be installed in each home and are thus prohibitively expensive. In this paper, we propose a novel application of feature-based matrix factorisation that does not require any additional hard- ware installation. The basic premise of our approach is that common design and construction patterns for homes create a repeating structure in their energy data. Thus, a sparse basis can be used to represent energy data from a broad range of homes. We evaluate our approach on 516 homes from a publicly available data set and find it to be more effective than five baseline approaches that either require sensing in each home, or a very rigorous survey across a large number of homes coupled with complex modelling. We also present a deployment of our system as a live web application that can potentially provide energy breakdown to millions of homes.


Resource Constrained Structured Prediction

AAAI Conferences

We study the problem of structured prediction under test-time budget constraints. We propose a novel approach based on selectively acquiring computationally costly features during test-time in order to reduce the computational cost of pre- diction with minimal performance degradation. We formulate a novel empirical risk minimization (ERM) for policy learning. We show that policy learning can be reduced to a series of structured learning problems, resulting in efficient training using existing structured learning algorithms. This framework provides theoretical justification for several existing heuristic approaches found in literature. We evaluate our proposed adaptive system on two structured prediction tasks, optical character recognition and dependency parsing and show significant reduction in the feature costs without degrading accuracy.