Statistical Learning
Exploiting Task-Feature Co-Clusters in Multi-Task Learning
Xu, Linli (University of Science and Technology of China) | Huang, Aiqing (University of Science and Technology of China) | Chen, Jianhui (Yahoo Labs) | Chen, Enhong (University of Science and Technology of China)
In multi-task learning, multiple related tasks are considered simultaneously, with the goal to improve the generalization performance by utilizing the intrinsic sharing of information across tasks. This paper presents a multi-task learning approach by modeling the task-feature relationships. Specifically, instead of assuming that similar tasks have similar weights on all the features, we start with the motivation that the tasks should be related in terms of subsets of features, which implies a co-cluster structure. We design a novel regularization term to capture this task-feature co-cluster structure. A proximal algorithm is adopted to solve the optimization problem. Convincing experimental results demonstrate the effectiveness of the proposed algorithm and justify the idea of exploiting the task-feature relationships.
Large-Margin Multi-Label Causal Feature Learning
Xu, Chang (Peking University) | Tao, Dacheng (Univiersity of Technology, Sydney) | Xu, Chao (Peking University)
In multi-label learning, an example is represented by a descriptive feature associated with several labels. Simply considering labels as independent or correlated is crude; it would be beneficial to define and exploit the causality between multiple labels. For example, an image label 'lake' implies the label 'water', but not vice versa. Since the original features are a disorderly mixture of the properties originating from different labels, it is intuitive to factorize these raw features to clearly represent each individual label and its causality relationship.Following the large-margin principle, we propose an effective approach to discover the causal features of multiple labels, thus revealing the causality between labels from the perspective of feature. We show theoretically that the proposed approach is a tight approximation of the empirical multi-label classification error, and the causality revealed strengthens the consistency of the algorithm. Extensive experimentations using synthetic and real-world data demonstrate that the proposed algorithm effectively discovers label causality, generates causal features, and improves multi-label learning.
Forecasting Collector Road Speeds Under High Percentage of Missing Data
Xin, Xin (Beijing Institute of Technology) | Lu, Chunwei (Autopia Mobile Tech Group Inc.) | Wang, Yashen (Beijing Institute of Technology) | Huang, Heyan (Beijing Institute of Technology)
Accurate road speed predictions can help drivers in smart route planning. Although the issue has been studied previously, most existing work focus on arterial roads only, where sensors are configured closely for collecting complete real-time data. For collector roads where sensors sparsly cover, however, speed predictions are often ignored. With GPS-equipped floating car signals being available nowadays, we aim at forecasting collector road speeds by utilizing these signals. The main challenge compared with arterial roads comes from the missing data. In a time slot of the real case, over 90% of collector roads cannot be covered by enough floating cars. Thus most traditional approaches for arterial roads, relying on complete historical data, cannot be employed directly. Aiming at solving this problem, we propose a multi-view road speed prediction framework. In the first view, temporal patterns are modeled by a layered hidden Markov model; and in the second view, spatial patterns are modeled by a collective matrix factorization model. The two models are learned and inferred simultaneously in a co-regularized manner. Experiments conducted in the Beijing road network, based on 10K taxi signals in 2 years, have demonstrated that the approach outperforms traditional approaches by 10% in MAE and RMSE.
Exploring Social Context for Topic Identification in Short and Noisy Texts
Wang, Xin (Jilin University;Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education) | Wang, Ying (Changchun Institute of Tech) | Zuo, Wanli (Jilin University) | Cai, Guoyong (Jilin University)
With the pervasion of social media, topic identification in short texts attracts increasing attention inย recent years. However, in nature the texts of social media are short and noisy, and the structures are sparse and dynamic, resulting in difficulty to identify topic categories exactly from online social media. Inspired by social science findings that preference consistency and social contagion are observed in social media, we investigate topic identification in short and noisy texts by exploring social context from the perspective of social sciences. In particular, we present a mathematical optimization formulation that incorporates the preference consistency and social contagion theories into a supervised learning method, and conduct feature selection to tackle short and noisy texts in social media, which result in a Sociological framework for Topic Identification (STI). Experimental results on real-world datasets from Twitter and Citation Network demonstrate the effectiveness of the proposed framework. Further experiments are conducted to understand the importance of social context in topic identification.
Coupled Interdependent Attribute Analysis on Mixed Data
Wang, Can (Digital Productivity Flagship, CSIRO) | Chi, Chi-Hung (Digital Productivity Flagship, CSIRO) | Zhou, Wei (Digital Productivity Flagship, CSIRO) | Wong, Raymond (University of New South Wales)
In the real-world applications, heterogeneous interdependent attributes that consist of both discrete and numerical variables can be observed ubiquitously. The usual representation of these data sets is an information table, assuming the independence of attributes. However, very often, they are actually interdependent on one another, either explicitly or implicitly. Limited research has been conducted in analyzing such attribute interactions, which causes the analysis results to be more local than global. This paper proposes the coupled heterogeneous attribute analysis to capture the interdependence among mixed data by addressing coupling context and coupling weights in unsupervised learning. Such global couplings integrate the interactions within discrete attributes, within numerical attributes and across them to form the coupled representation for mixed type objects based on dimension conversion and feature selection. This work makes one step forward towards explicitly modeling the interdependence of heterogeneous attributes among mixed data, verified by the applications in data structure analysis, data clustering evaluation, and density comparison. Substantial experiments on 12 UCI data sets show that our approach can effectively capture the global couplings of heterogeneous attributes and outperforms the state-of-the-art methods, supported by statistical analysis.
Learning Hybrid Models with Guarded Transitions
Santana, Pedro (Massachusetts Institute of Technology) | Lane, Spencer (Massachusetts Institute of Technology) | Timmons, Eric (Massachusetts Institute of Technology) | Williams, Brian (Massachusetts Institute of Technology) | Forster, Carlos (Instituto Tecnolรณgico de Aeronรกutica)
Innovative methods have been developed for diagnosis, activity monitoring, and state estimation that achieve high accuracy through the use of stochastic models involving hybrid discrete and continuous behaviors. A key bottleneck is the automated acquisition of these hybrid models, and recent methods have focused predominantly on Jump Markov processes and piecewise autoregressive models. In this paper, we present a novel algorithm capable of performing unsupervised learning of guarded Probabilistic Hybrid Automata (PHA) models, which extends prior work by allowing stochastic discrete mode transitions in a hybrid system to have a functional dependence on its continuous state. Our experiments indicate that guarded PHA models can yield significant performance improvements when used by hybrid state estimators, particularly when diagnosing the true discrete mode of the system, without any noticeable impact on their real-time performance.
Propagating Ranking Functions on a Graph: Algorithms and Applications
Qian, Buyue (IBM T. J. Watson Research) | Wang, Xiang (IBM T. J. Watson Research) | Davidson, Ian (University of California, Davis)
Learning to rank is an emerging learning task that opens up a diverse set of applications. However, most existing work focuses on learning a single ranking function whilst in many real world applications, there can be many ranking functions to fulfill various retrieval tasks on the same data set. How to train many ranking functions is challenging due to the limited availability of training data which is further compounded when plentiful training data is available for a small subset of the ranking functions. This is particularly true in settings, such as personalized ranking/retrieval, where each person requires a unique ranking function according to their preference, but only the functions of the persons who provide sufficient ratings (of objects, such as movies and music) can be well trained. To address this, we propose to construct a graph where each node corresponds to a retrieval task, and then propagate ranking functions on the graph. We illustrate the usefulness of the idea of propagating ranking functions and our method by exploring two real world applications.
Lazier Than Lazy Greedy
Mirzasoleiman, Baharan (ETH Zurich) | Badanidiyuru, Ashwinkumar (Google Research Mountain View) | Karbasi, Amin (Yale University) | Vondrak, Jan (IBM Almaden) | Krause, Andreas (ETH Zurich)
Is it possible to maximize a monotone submodular function faster than the widely used lazy greedy algorithm (also known as accelerated greedy), both in theory and practice? In this paper, we develop the first linear-time algorithm for maximizing a general monotone submodular function subject to a cardinality constraint. We show that our randomized algorithm, STOCHASTIC-GREEDY, can achieve a (1 โ 1/e โ ฮต) approximation guarantee, in expectation, to the optimum solution in time linear in the size of the data and independent of the cardinality constraint. We empirically demonstrate the effectiveness of our algorithm on submodular functions arising in data summarization, including training large-scale kernel methods, exemplar-based clustering, and sensor placement. We observe that STOCHASTIC-GREEDY practically achieves the same utility value as lazy greedy but runs much faster. More surprisingly, we observe that in many practical scenarios STOCHASTIC-GREEDY does not evaluate the whole fraction of data points even once and still achieves indistinguishable results compared to lazy greedy.
A Regularized Linear Dynamical System Framework for Multivariate Time Series Analysis
Liu, Zitao (University of Pittsburgh) | Hauskrecht, Milos (University of Pittsburgh)
Linear Dynamical System (LDS) is an elegant mathematical framework for modeling and learning Multivariate Time Series (MTS). However, in general, it is difficult to set the dimension of an LDS's hidden state space. A small number of hidden states may not be able to model the complexities of a MTS, while a large number of hidden states can lead to overfitting. In this paper, we study learning methods that impose various regularization penalties on the transition matrix of the LDS model and propose a regularized LDS learning framework (rLDS) which aims to (1) automatically shut down LDSs' spurious and unnecessary dimensions, and consequently, address the problem of choosing the optimal number of hidden states; (2) prevent the overfitting problem given a small amount of MTS data; and (3) support accurate MTS forecasting. To learn the regularized LDS from data we incorporate a second order cone program and a generalized gradient descent method into the Maximum a Posteriori framework and use Expectation Maximization to obtain a low-rank transition matrix of the LDS model. We propose two priors for modeling the matrix which lead to two instances of our rLDS. We show that our rLDS is able to recover well the intrinsic dimensionality of the time series dynamics and it improves the predictive performance when compared to baselines on both synthetic and real-world MTS datasets.
Nonstationary Gaussian Process Regression for Evaluating Repeated Clinical Laboratory Tests
Lasko, Thomas A. (Vanderbilt University School of Medicine)
Sampling repeated clinical laboratory tests with appropriate timing is challenging because the latent physiologic function being sampled is in general nonstationary. When ordering repeated tests, clinicians adopt various simple strategies that may or may not be well suited to the behavior of the function. Previous research on this topic has been primarily focused on cost-driven assessments of oversampling. But for monitoring physiologic state or for retrospective analysis, undersampling can be much more problematic than oversampling. In this paper we analyze hundreds of observation sequences of four different clinical laboratory tests to provide principled, data-driven estimates of undersampling and oversampling, and to assess whether the sampling adapts to changing volatility of the latent function. To do this, we developed a new method for fitting a Gaussian process to samples of a nonstationary latent function. Our method includes an explicit estimate of the latent function's volatility over time, which is deterministically related to its nonstationarity. We find on average that the degree of undersampling is up to an order of magnitude greater than oversampling, and that only a small minority are sampled with an adaptive strategy.