Regression
Predicting Emotion Perception Across Domains: A Study of Singing and Speaking
Zhang, Biqiao (University of Michigan) | Provost, Emily Mower (University of Michigan) | Swedberg, Robert (University of Michigan) | Essl, Georg (University of Michigan)
Emotion affects our understanding of the opinions and sentiments of others. Research has demonstrated that humans are able to recognize emotions in various domains, including speech and music, and that there are potential shared features that shape the emotion in both domains. In this paper, we investigate acoustic and visual features that are relevant to emotion perception in the domains of singing and speaking. We train regression models using two paradigms: (1) within-domain, in which models are trained and tested on the same domain and (2) cross-domain, in which models are trained on one domain and tested on the other domain. This strategy allows us to analyze the similarities and differences underlying the relationship between audio-visual feature expression and emotion perception and how this relationship is affected by domain of expression. We use kernel density estimation to model emotion as a probability distribution over the perception associated with multiple evaluators on the valence-activation space. This allows us to model the variation inherent in the reported perception. Results suggest that activation can be modeled more accurately across domains, compared to valence. Furthermore, visual features capture cross-domain emotion more accurately than acoustic features. The results provide additional evidence for a shared mechanism underlying spoken and sung emotion perception.
Efficient Benchmarking of Hyperparameter Optimizers via Surrogates
Eggensperger, Katharina (University of Freiburg) | Hutter, Frank (University of Freiburg) | Hoos, Holger (University of British Columbia) | Leyton-Brown, Kevin (University of British Columbia)
Hyperparameter optimization is crucial for achieving peak performance with many machine learning algorithms; however, the evaluation of new optimization techniques on real-world hyperparameter optimization problems can be very expensive. Therefore, experiments are often performed using cheap synthetic test functions with characteristics rather different from those of real benchmarks of interest. In this work, we introduce another option: cheap-to-evaluate surrogates of real hyperparameter optimization benchmarks that share the same hyperparameter spaces and feature similar response surfaces. Specifically, we train regression models on data describing a machine learning algorithm’s performance depending on its hyperparameter setting, and then cheaply evaluate hyperparameter optimization methods using the model’s performance predictions in lieu of running the real algorithm. We evaluated a wide range of regression techniques, both in terms of how well they predict the performance of new hyperparameter settings and in terms of the quality of surrogate benchmarks obtained. We found that tree-based models capture the performance of several machine learning algorithms well and yield surrogate benchmarks that closely resemble real-world benchmarks, while being much easier to use and orders of magnitude cheaper to evaluate.
Learning Plausible Inferences from Semantic Web Knowledge by Combining Analogical Generalization with Structured Logistic Regression
Liang, Chen (Northwestern University) | Forbus, Kenneth D. (Northwestern University)
Fast and efficient learning over large bodies of commonsense knowledge is a key requirement for cognitive systems. Semantic web knowledge bases provide an important new resource of ground facts from which plausible inferences can be learned. This paper applies structured logistic regression with analogical generalization (SLogAn) to make use of structural as well as statistical information to achieve rapid and robust learning. SLogAn achieves state-of-the-art performance in a standard triplet classification task on two data sets and, in addition, can provide understandable explanations for its answers.
Representation Learning for Aspect Category Detection in Online Reviews
Zhou, Xinjie (Peking University) | Wan, Xiaojun (Peking University) | Xiao, Jianguo (Peking University)
User-generated reviews are valuable resources for decision making. Identifying the aspect categories discussed in a given review sentence (e.g., “food” and “service” in restaurant reviews) is an important task of sentiment analysis and opinion mining. Given a predefined aspect category set, most previous researches leverage hand-crafted features and a classification algorithm to accomplish the task. The crucial step to achieve better performance is feature engineering which consumes much human effort and may be unstable when the product domain changes. In this paper, we propose a representation learning approach to automatically learn useful features for aspect category detection. Specifically, a semi-supervised word embedding algorithm is first proposed to obtain continuous word representations on a large set of reviews with noisy labels. Afterwards, we propose to generate deeper and hybrid features through neural networks stacked on the word vectors. A logistic regression classifier is finally trained with the hybrid features to predict the aspect category. The experiments are carried out on a benchmark dataset released by SemEval-2014. Our approach achieves the state-of-the-art performance and outperforms the best participating team as well as a few strong baselines.
Causal Inference via Sparse Additive Models with Application to Online Advertising
Sun, Wei (Purdue University) | Wang, Pengyuan (Yahoo! Labs) | Yin, Dawei (Yahoo! Labs) | Yang, Jian (Yahoo! Labs) | Chang, Yi (Yahoo! Labs)
Advertising effectiveness measurement is a fundamental problem in online advertising. Various causal inference methods have been employed to measure the causal effects of ad treatments. However, existing methods mainly focus on linear logistic regression for univariate and binary treatments and are not well suited for complex ad treatments of multi-dimensions, where each dimension could be discrete or continuous. In this paper we propose a novel two-stage causal inference framework for assessing the impact of complex ad treatments. In the first stage, we estimate the propensity parameter via a sparse additive model; in the second stage, a propensity-adjusted regression model is applied for measuring the treatment effect. Our approach is shown to provide an unbiased estimation of the ad effectiveness under regularity conditions. To demonstrate the efficacy of our approach, we apply it to a real online advertising campaign to evaluate the impact of three ad treatments: ad frequency, ad channel, and ad size. We show that the ad frequency usually has a treatment effect cap when ads are showing on mobile device. In addition, the strategies for choosing best ad size are completely different for mobile ads and online ads.
Estimating Temporal Dynamics of Human Emotions
Kim, Seungyeon (Georgia Institute of Technology) | Lee, Joonseok (Georgia Institute of Technology) | Lebanon, Guy (Amazon) | Park, Haesun (Georgia Institute of Technology)
Sentiment analysis predicts a one-dimensional quantity describing the positive or negative emotion of an author. Mood analysis extends the one-dimensional sentiment response to a multi-dimensional quantity, describing a diverse set of human emotions. In this paper, we extend sentiment and mood analysis temporally and model emotions as a function of time based on temporal streams of blog posts authored by a specific author. The model is useful for constructing predictive models and discovering scientific models of human emotions.
On the Bayes-optimality of F-measure maximizers
Waegeman, Willem, Dembczynski, Krzysztof, Jachnik, Arkadiusz, Cheng, Weiwei, Hullermeier, Eyke
The F-measure, which has originally been introduced in information retrieval, is nowadays routinely used as a performance metric for problems such as binary classification, multi-label classification, and structured output prediction. Optimizing this measure is a statistically and computationally challenging problem, since no closed-form solution exists. Adopting a decision-theoretic perspective, this article provides a formal and experimental analysis of different approaches for maximizing the F-measure. We start with a Bayes-risk analysis of related loss functions, such as Hamming loss and subset zero-one loss, showing that optimizing such losses as a surrogate of the F-measure leads to a high worst-case regret. Subsequently, we perform a similar type of analysis for F-measure maximizing algorithms, showing that such algorithms are approximate, while relying on additional assumptions regarding the statistical distribution of the binary response variables. Furthermore, we present a new algorithm which is not only computationally efficient but also Bayes-optimal, regardless of the underlying distribution. To this end, the algorithm requires only a quadratic (with respect to the number of binary responses) number of parameters of the joint distribution. We illustrate the practical performance of all analyzed methods by means of experiments with multi-label classification problems.
Sustainable Building Design: A Challenge at the Intersection of Machine Learning and Design Optimization
Gilan, Siamak Safarzadegan (Georgia Institute of Technology) | Dilkina, Bistra (Georgia Institute of Technology)
Residential and commercial buildings are responsible for about 40% of primary energy consumption in the United States, hence improving their energy efficiency could have important sustainability benefits. The design of a building has tremendous effect on its energy profile, and recently there has been an increased interest in developing optimization methods that support the design of high performance buildings. Previous approaches are either based on simulation optimization or on training an accurate predictive model that is queried during the optimization. We propose a method that more tightly integrates the machine learning and optimization components, by employing active learning during optimization. In particular, we use a Gaussian Process (GP) model for the prediction and active learning and multi-objective genetic algorithm NSGA-II for the optimization. We develop a comprehensive and publicly available benchmark for building design optimization. We evaluate 5 machine learning approaches on our dataset, and show that the GP model is competitive, in addition to being well-suited for the active learning setting. We compare our optimization approach against the 2-stage approach and simulation optimization. Our results show that our approach produces solutions at the Pareto frontier compared to the other two approaches, while using only a fraction of the simulations and time.
Social Information Improves Location Prediction in the Wild
Li, Jai (University of Illinois at Chicago) | Brugere, Ivan (University of Illinois at Chicago) | Ziebart, Brian (University of Illinois at Chicago) | Berger-Wolf, Tanya (University of Illinois at Chicago) | Crofoot, Margaret (University of California-Davis) | Farine, Damien (University of California-Davis)
How can knowing the location of my friends be used to more accurately predict my location? This paper explores socially-aware location prediction under a particularly challenging setting where the underlying interactions and social network are unknown and must be inferred over continuous spatiotemporal data. Our method samples inferred network topology using a linear regression model to predict future individual locations. We present an in-depth empirical study comparing different network models and network sampling regimes under a bootstrapped sampling baseline. Furthermore, our qualitative analysis demonstrates the value of social information in population mobility modeling under our application’s challenges.
Forecasting Uncertainty in Electricity Demand
Wijaya, Tri Kurniawan (EPFL) | Sinn, Mathieu (IBM Research) | Chen, Bei (IBM Research)
Generalized Additive Models (GAM) are a widely popular class of regression models to forecast electricity demand, due to their high accuracy, flexibility and interpretability. However, the residuals of the fitted GAM are typically heteroscedastic and leptokurtic caused by the nature of energy data. In this paper we propose a novel approach to estimate the time-varying conditional variance of the GAM residuals, which we call the GAM2 algorithm. It allows utility companies and network operators to assess the uncertainty of future electricity demand and incorporate it into their planning processes. The basic idea of our algorithm is to apply another GAM to the squared residuals to explain the dependence of uncertainty on exogenous variables. Empirical evidence shows that the residuals rescaled by the estimated conditional variance are approximately normal. We combine our modeling approach with online learning algorithms that adjust for dynamic changes in the distributions of demand. We illustrate our method by a case study on data from RTE, the operator of the French transmission grid.