Statistical Learning
Scalable and Interpretable Data Representation for High-Dimensional, Complex Data
Kim, Been (Massachusetts Institute of Technology) | Patel, Kayur (Google) | Rostamizadeh, Afshin (Google) | Shah, Julie (Massachusetts Institute of Technology)
The majority of machine learning research has been focused on building models and inference techniques with sound mathematical properties and cutting edge performance. Little attention has been devoted to the development of data representation that can be used to improve a user's ability to interpret the data and machine learning models to solve real-world problems. In this paper, we quantitatively and qualitatively evaluate an efficient, accurate and scalable feature-compression method using latent Dirichlet allocation for discrete data. This representation can effectively communicate the characteristics of high-dimensional, complex data points. We show that the improvement of a user's interpretability through the use of a topic modeling-based compression technique is statistically significant, according to a number of metrics, when compared with other representations. Also, we find that this representation is scalable --- it maintains alignment with human classification accuracy as an increasing number of data points are shown. In addition, the learned topic layer can semantically deliver meaningful information to users that could potentially aid human reasoning about data characteristics in connection with compressed topic space.
Identifying At-Risk Students in Massive Open Online Courses
He, Jiazhen (The University of Melbourne) | Bailey, James (The University of Melbourne) | Rubinstein, Benjamin I. P. (The University of Melbourne) | Zhang, Rui (The University of Melbourne)
Massive Open Online Courses (MOOCs) have received widespread attention for their potential to scale higher education, with multiple platforms such as Coursera, edX and Udacity recently appearing. Despite their successes, a major problem faced by MOOCs is low completion rates. In this paper, we explore the accurate early identification of students who are at risk of not completing courses. We build predictive models weekly, over multiple offerings of a course. Furthermore, we envision student interventions that present meaningful probabilities of failure, enacted only for marginal students.To be effective, predicted probabilities must be both well-calibrated and smoothed across weeks.Based on logistic regression, we propose two transfer learning algorithms to trade-off smoothness and accuracy by adding a regularization term to minimize the difference of failure probabilities between consecutive weeks. Experimental results on two offerings of a Coursera MOOC establish the effectiveness of our algorithms.
Constructing Models of User and Task Characteristics from Eye Gaze Data for User-Adaptive Information Highlighting
Gingerich, Matthew Junghyun (University of British Columbia) | Conati, Cristina (University of British Columbia)
A user-adaptive information visualization system capable of learning models of users and the visualization tasks they perform could provide interventions optimized for helping specific users in specific task contexts. In this paper, we investigate the accuracy of predicting visualization tasks, user performance on tasks, and user traits from gaze data. We show that predictions made with a logistic regression model are significantly better than a baseline classifier, with particularly strong results for predicting task type and user performance. Furthermore, we compare classifiers built with interface-independent and interface-dependent features, and show that the interface-independent features are comparable or superior to interface-dependent ones. Finally, we discuss how the accuracy of predictive models is affected if they are trained with data from trials that had highlighting interventions added to the visualization.
Structured Sparsity with Group-Graph Regularization
Dai, Xin-Yu (State Key Laboratory for Novel Software Technology, Nanjing University) | Zhang, Jian-Bing (State Key Laboratory for Novel Software Technology, Nanjing University) | Huang, Shu-Jian (State Key Laboratory for Novel Software Technology, Nanjing University) | Chen, Jia-Jun (State Key Laboratory for Novel Software Technology, Nanjing University) | Zhou, Zhi-Hua (State Key Laboratory for Novel Software Technology, Nanjing University)
In many learning tasks with structural properties, structural sparsity methods help induce sparse models, usually leading to better interpretability and higher generalization performance. One popular approach is to use group sparsity regularization that enforces sparsity on the clustered groups of features, while another popular approach is to adopt graph sparsity regularization that considers sparsity on the link structure of graph embedded features. Both the group and graph structural properties co-exist in many applications. However, group sparsity and graph sparsity have not been considered simultaneously yet. In this paper, we propose a g 2 -regularization that takes group and graph sparsity into joint consideration, and present an effective approach for its optimization. Experiments on both synthetic and real data show that, enforcing group-graph sparsity lead to better performance than using group sparsity or graph sparsity only.
Variational Inference for Nonparametric Bayesian Quantile Regression
Abeywardana, Sachinthaka (University of Sydney) | Ramos, Fabio (University of Sydney)
Quantile regression deals with the problem of computing robust estimators when the conditional mean and standard deviation of the predicted function are inadequate to capture its variability. The technique has an extensive list of applications, including health sciences, ecology and finance. In this work we present a non-parametric method of inferring quantiles and derive a novel Variational Bayesian (VB) approximation to the marginal likelihood, leading to an elegant Expectation Maximisation algorithm for learning the model. Our method is nonparametric, has strong convergence guarantees, and can deal with nonsymmetric quantiles seamlessly. We compare the method to other parametric and non-parametric Bayesian techniques, and alternative approximations based on expectation propagation demonstrating the benefits of our framework in toy problems and real datasets.
Structured Embedding via Pairwise Relations and Long-Range Interactions in Knowledge Base
Wu, Fei (Zhejiang University) | Song, Jun (Zhejiang University) | Yang, Yi (University of Technology, Sydney) | Li, Xi (Zhejiang University) | Zhang, Zhongfei (Zhejiang University) | Zhuang, Yueting (Zhejiang University)
We consider the problem of embedding entities and relations of knowledge bases into low-dimensional continuous vector spaces (distributed representations). Unlike most existing approaches, which are primarily efficient for modelling pairwise relations between entities, we attempt to explicitly model both pairwise relations and long-range interactions between entities, by interpreting them as linear operators on the low-dimensional embeddings of the entities. Therefore, in this paper we introduces Path-Ranking to capture the long-range interactions of knowledge graph and at the same time preserve the pairwise relations of knowledge graph; we call it 'structured embedding via pairwise relation and long-range interactions' (referred to as SePLi). Comparing with the-state-of-the-art models, SePLi achieves better performances of embeddings.
Pattern-Based Variant-Best-Neighbors Respiratory Motion Prediction Using Orthogonal Polynomials Approximation
Kam, KinMing (The University of Texas at Arlington) | Wang, Shouyi (The University of Texas at Arlington) | Bowen, Stephen R. (University of Washington) | Chaovalitwongse, Wanpracha (University of Washington)
Motion-adaptive radiotherapy techniques are promising to deliver truly ablative radiation doses to tumors with minimal normal tissue exposure by accounting for real-time tumor movement. However, a major challenge of successful applications of these techniques is the real-time prediction of breathing-induced tumor motion to accommodate system delivery latencies. Predicting respiratory motion in real-time is challenging. The current respiratory motion prediction approaches are still not satisfactory in terms of accuracy and interpretability due to the complexity of breathing patterns and the high inter-individual variability across patients. In this paper, we propose a novel respiratory motion prediction framework which integrates four key components: a personalized monitoring window generator, an orthogonal polynomial approximation-based pattern library builder, a variant best neighbor pattern searcher, and a statistical prediction decision maker. The four functional components work together into a real-time prediction system and is capable of performing personalized tumor position prediction during radiotherapy. Based on a study of respiratory motion of 27 patients with lung cancer, the proposed prediction approach generated consistently better prediction performances than the current respiratory motion prediction approaches, particularly for long prediction horizons.
Predicting Emotion Perception Across Domains: A Study of Singing and Speaking
Zhang, Biqiao (University of Michigan) | Provost, Emily Mower (University of Michigan) | Swedberg, Robert (University of Michigan) | Essl, Georg (University of Michigan)
Emotion affects our understanding of the opinions and sentiments of others. Research has demonstrated that humans are able to recognize emotions in various domains, including speech and music, and that there are potential shared features that shape the emotion in both domains. In this paper, we investigate acoustic and visual features that are relevant to emotion perception in the domains of singing and speaking. We train regression models using two paradigms: (1) within-domain, in which models are trained and tested on the same domain and (2) cross-domain, in which models are trained on one domain and tested on the other domain. This strategy allows us to analyze the similarities and differences underlying the relationship between audio-visual feature expression and emotion perception and how this relationship is affected by domain of expression. We use kernel density estimation to model emotion as a probability distribution over the perception associated with multiple evaluators on the valence-activation space. This allows us to model the variation inherent in the reported perception. Results suggest that activation can be modeled more accurately across domains, compared to valence. Furthermore, visual features capture cross-domain emotion more accurately than acoustic features. The results provide additional evidence for a shared mechanism underlying spoken and sung emotion perception.
On the Impossibility of Convex Inference in Human Computation
Shah, Nihar B. (University of California, Berkeley) | Zhou, Dengyong (Microsoft Research)
Human computation or crowdsourcing involves joint inference of the ground-truth-answers and the worker-abilities by optimizing an objective function, for instance, by maximizing the data likelihood based on an assumed underlying model. A variety of methods have been proposed in the literature to address this inference problem. As far as we know, none of the objective functions in existing methods is convex. In machine learning and applied statistics, a convex function such as the objective function of support vector machines (SVMs) is generally preferred, since it can leverage the high-performance algorithms and rigorous guarantees established in the extensive literature on convex optimization. One may thus wonder if there exists a meaningful convex objective function for the inference problem in human computation. In this paper, we investigate this convexity issue for human computation. We take an axiomatic approach by formulating a set of axioms that impose two mild and natural assumptions on the objective function for the inference. Under these axioms, we show that it is unfortunately impossible to ensure convexity of the inference problem. On the other hand, we show that interestingly, in the absence of a requirement to model "spammers", one can construct reasonable objective functions for crowdsourcing that guarantee convex inference.
Exploiting Variable Associations to Configure Efficient Local Search in Large-Scale Set Partitioning Problems
Umetani, Shunji (Osaka University)
We present a data mining approach for reducing the search space of local search algorithms in large-scale set partitioning problems (SPPs). We construct a k-nearest neighbor graph by extracting variable associations from the instance to be solved, in order to identify promising pairs of flipping variables in the large neighborhood search. We incorporate the search space reduction technique into a 2-flip neighborhood local search algorithm with an efficient incremental evaluation of solutions and an adaptive control of penalty weights. We also develop a 4-flip neighborhood local search algorithm that flips four variables alternately along 4-paths or 4-cycles in the k-nearest neighbor graph. According to computational comparison with the latest solvers, our algorithm performs effectively for large-scale SPP instances with up to 2.57 million variables.