Statistical Learning
Signals in the Silence: Models of Implicit Feedback in a Recommendation System for Crowdsourcing
Lin, Christopher H (University of Washington) | Kamar, Ece (Microsoft Research) | Horvitz, Eric (Microsoft Research)
We exploit the absence of signals as informative observations in the context of providing task recommendations in crowdsourcing. Workers on crowdsourcing platforms do not provide explicit ratings about tasks. We present methods that enable a system to leverage implicit signals about task preferences. These signals include types of tasks that have been available and have been displayed, and the number of tasks workers select and complete. In contrast to previous work, we present a general model that can represent both positive and negative implicit signals. We introduce algorithms that can learn these models without exceeding the computational complexity of existing approaches. Finally, using data from a high-throughput crowdsourcing platform, we show that reasoning about both positive and negative implicit feedback can improve the quality of task recommendations.
Sparse Learning for Stochastic Composite Optimization
Zhang, Weizhong (Zhejiang University) | Zhang, Lijun (Michigan State University) | Hu, Yao (Zhejiang University) | Jin, Rong (Michigan State University) | Cai, Deng (Zhejiang University) | He, Xiaofei (Zhejiang University)
In this paper, we focus on Stochastic Composite Optimization (SCO) for sparse learning that aims to learn a sparse solution. Although many SCO algorithms have been developed for sparse learning with an optimal convergence rate $O(1/T)$, they often fail to deliver sparse solutions at the end either because of the limited sparsity regularization during stochastic optimization or due to the limitation in online-to-batch conversion. To improve the sparsity of solutions obtained by SCO, we propose a simple but effective stochastic optimization scheme that adds a novel sparse online-to-batch conversion to the traditional SCO algorithms. The theoretical analysis shows that our scheme can find a solution with better sparse patterns without affecting the convergence rate. Experimental results on both synthetic and real-world data sets show that the proposed methods are more effective in recovering the sparse solution and have comparable convergence rate as the state-of-the-art SCO algorithms for sparse learning.
Modeling and Mining Spatiotemporal Patterns of Infection Risk from Heterogeneous Data for Active Surveillance Planning
Yang, Bo (Jilin University) | Guo, Hua (Jilin University) | Yang, Yi (Jilin University) | Shi, Benyun (Hong Kong Baptist University) | Zhou, Xiaonong (Chinese CDC) | Liu, Jiming (Hong Kong Baptist University)
Active surveillance is a desirable way to prevent the spread of infectious diseases in that it aims to timely discover individual incidences through an active searching for patients. However, in practice active surveillance is difficult to implement especially when monitoring space is large but available resources are limited. Therefore, it is extremely important for public health authorities to know how to distribute their very sparse resources to high-priority regions so as to maximize the outcomes of active surveillance. In this paper, we raise the problem of active surveillance planning and provide an effective method to address it via modeling and mining spatiotemporal patterns of infection risks from heterogeneous data sources. Taking malaria as an example, we perform an empirical study on real-world data to validate our method and provide our new findings.
Spatial Scan for Disease Mapping on a Mobile Population
Lan, Liang (Temple University) | Malbasa, Vuk (University of Novi Sad) | Vucetic, Slobodan (Temple University)
In disease mapping, the spatial scan statistic is used to detect spatial regions where population is exposed to a significantly higher disease risk than expected. In this important application, the current residence is typically used to define the location of individuals from the population. Considering the mobility of humans at various temporal and spatial scales, using only information about the current residence may be an insufficiently informative proxy because it ignores a multitude of exposures that may occur away from home, or which had occurred at previous residences. In this paper, we propose a spatial scan statistic that is appropriate for disease mapping on mobile populations. We formulate a computationally efficient algorithm that uses the proposed statistic to find significant high-risk regions from mobile population's disease status data. The algorithm is applicable on large populations and over dense spatial grids. The experimental results demonstrate that the proposed algorithm is computationally efficient and outperforms the traditional disease clustering approaches at discovering high-risk regions in mobile populations.
A Region-Based Model for Estimating Urban Air Pollution
Jutzeler, Arnaud (Ecole Polytechnique Federale de Lausanne) | Li, Jason Jingshi (The Australian National University) | Faltings, Boi (Ecole Polytechnique Federale de Lausanne)
Air pollution has a direct impact to human health, and data-driven air quality models are useful for evaluating population exposure to air pollutants. In this paper, we propose a novel region-based Gaussian process model for estimating urban air pollution dispersion, and applied it to a large dataset of ultrafine particle (UFP) measurements collected from a network of sensors located on several trams in the city of Zurich. We show that compared to existing grid-based models, the region-based model produces better predictions across aggregates of all time scales. The new model is appropriate for many useful user applications such as exposure assessment and anomaly detection.
Efficient Codes for Inverse Dynamics During Walking
Johnson, Leif (The University of Texas at Austin) | Ballard, Dana H (The University of Texas at Austin)
Efficient codes have been used effectively in both computer science and neuroscience to better understand the information processing in visual and auditory encoding and discrimination tasks. In this paper, we explore the use of efficient codes for representing information relevant to human movements during locomotion. Specifically, we apply motion capture data to a physical model of the human skeleton to compute joint angles (inverse kinematics) and joint torques (inverse dynamics); then, by treating the resulting paired dataset as a supervised regression problem, we investigate the effect of sparsity in mapping from angles to torques. The results of our investigation suggest that sparse codes can indeed represent salient features of both the kinematic and dynamic views of human locomotion movements. However, sparsity appears to be only one parameter in building a model of inverse dynamics; we also show that the "encoding" process benefits significantly by integrating with the "regression" process for this task. In addition, we show that, for this task, simple coding and decoding methods are not sufficient to model the extremely complex inverse dynamics mapping. Finally, we use our results to argue that representations of movement are critical to modeling and understanding these movements.
Forecasting Potential Diabetes Complications
Yang, Yang (Tsinghua University) | Luyten, Walter (Katholieke Universiteit Leuven) | Liu, Lu (Northwestern University) | Moens, Marie-Francine (Katholieke Universiteit Leuven) | Tang, Jie (Tsinghua University) | Li, Juanzi (Tsinghua University)
Diabetes complications often afflict diabetes patients seriously: over 68% of diabetes-related mortality is caused by diabetes complications. In this paper, we study the problem of automatically diagnosing diabetes complications from patients' lab test results. The objective problem has two main challenges: 1) feature sparseness: a patient only undergoes 1.26% lab tests on average, and 65.5% types of lab tests are performed on samples from less than 10 patients; 2) knowledge skewness: it lacks comprehensive detailed domain knowledge of the association between diabetes complications and lab tests. To address these challenges, we propose a novel probabilistic model called Sparse Factor Graph Model (SparseFGM). SparseFGM projects sparse features onto a lower-dimensional latent space, which alleviates the problem of sparseness. SparseFGM is also able to capture the associations between complications and lab tests, which help handle the knowledge skewness. We evaluate the proposed model on a large collections of real medical records. SparseFGM outperforms (+20% by F1) baselines significantly and gives detailed associations between diabetes complications and lab tests.
A Machine Learning Approach to Musically Meaningful Homogeneous Style Classification
Herlands, William (Princeton University) | Der, Ricky (University of Pennsylvania) | Greenberg, Yoel (Bar-Ilan University) | Levin, Simon (Princeton University)
Recent literature has demonstrated the difficulty of classifying between composers who write in extremely similar styles (homogeneous style). Additionally, machine learning studies in this field have been exclusively of technical import with little musicological interpretability or significance. We present a supervised machine learning system which addresses the difficulty of differentiating between stylistically homogeneous composers using foundational elements of music, their complexity and interaction. Our work expands on previous style classification studies by developing more complex features as well as introducing a new class of musical features which focus on local irregularities within musical scores. We demonstrate the discriminative power of the system as applied to Haydn and Mozart's string quartets. Our results yield interpretable musicological conclusions about Haydn's and Mozart's stylistic differences while distinguishing between the composers with higher accuracy than previous studies in this domain.
Where and Why Users "Check In"
Cho, Yoon-Sik (University of Southern California, Information Science Institute) | Steeg, Greg Ver (University of Southern California, Information Science Institute) | Galstyan, Aram (University of Southern California, Information Science Institute)
The emergence of location based social network (LBSN) services makes it possible to study individuals’ mobility patterns at a fine-grained level and to see how they are impacted by social factors. In this study we analyze the check-in patterns in LBSN and observe significant temporal clustering of check-in activities. We explore how self-reinforcing behaviors, social factors, and exogenous effects contribute to this clustering and introduce a framework to distinguish these effects at the level of individual check-ins for both users and venues. Using check-in data from three major cities, we show not only that our model can improve prediction of future check-ins, but also that disentangling of different factors allows us to infer meaningful properties of different venues.
A Joint Optimization Model for Image Summarization Based on Image Content and Tags
Yu, Hongliang (Peking University) | Deng, Zhi-Hong (Peking University) | Yang, Yunlun (Peking University) | Xiong, Tao (The Johns Hopkins University)
As an effective technology for navigating a large number of images, image summarization is becoming a promising task with the rapid development of image sharing sites and social networks. Most existing summarization approaches use the visual-based features for image representation without considering tag information.In this paper, we propose a novel framework, named JOINT, which employs both image content and tag information to summarize images. Our model generates the summary images which can best reconstruct the original collection. Based on the assumption that an image with representative content should also have typical tags, we introduce a similarity-inducing regularizer to our model. Furthermore, we impose the lasso penalty on the objective function to yield a concise summary set. Extensive experiments demonstrate our model outperforms the state-of-the-art approaches.