Statistical Learning
Clustering Documents Along Multiple Dimensions
Dasgupta, Sajib (IBM Almaden Research Center) | Golden, Richard M. (University of Texas at Dallas) | Ng, Vincent (University of Texas at Dallas)
Traditional clustering algorithms are designed to search for a single clustering solution despite the fact that multiple alternative solutions might exist for a particular dataset. For example, a set of news articles might be clustered by topic or by the author's gender or age. Similarly, book reviews might be clustered by sentiment or comprehensiveness. In this paper, we address the problem of identifying alternative clustering solutions by developing a Probabilistic Multi-Clustering (PMC) model that discovers multiple, maximally different clusterings of a data sample. Empirical results on six datasets representative of real-world applications show that our PMC model exhibits superior performance to comparable multi-clustering algorithms.
Adaptive Step-Size for Online Temporal Difference Learning
Dabney, William (University of Massachusetts Amherst) | Barto, Andrew G (University of Massachusetts Amherst)
The step-size, often denoted as α, is a key parameter for most incremental learning algorithms. Its importance is especially pronounced when performing online temporal difference (TD) learning with function approximation. Several methods have been developed to adapt the step-size online. These range from straightforward back-off strategies to adaptive algorithms based on gradient descent. We derive an adaptive upper bound on the step-size parameter to guarantee that online TD learning with linear function approximation will not diverge. We then empirically evaluate algorithms using this upper bound as a heuristic for adapting the step-size parameter online. We compare performance with related work including HL(λ) and Autostep. Our results show that this adaptive upper bound heuristic out-performs all existing methods without requiring any meta-parameters. This effectively eliminates the need to tune the learning rate of temporal difference learning with linear function approximation.
Weighted Clustering
Ackerman, Margareta (University of Waterloo) | Ben-David, Shai (University of Waterloo) | Brânzei, Simina (Aarhus University) | Loker, David (University of Waterloo)
We investigate a natural generalization of the classical clustering problem, considering clustering tasks in which different instances may have different weights. We conduct the first extensive theoretical analysis on the influence of weighted data on standard clustering algorithms in both the partitional and hierarchical settings, characterizing the conditions under which algorithms react to weights. Extending a recent framework for clustering algorithm selection, we propose intuitive properties that would allow users to choose between clustering algorithms in the weighted setting and classify algorithms accordingly.
Towards Population Scale Activity Recognition: A Framework for Handling Data Diversity
Abdullah, Saeed (Cornell University) | Lane, Nicholas D. (Microsoft Research Asia) | Choudhury, Tanzeem (Cornell University)
The rising popularity of the sensor-equipped smartphone is changing the possible scale and scope of human activity inference. The diversity in user population seen in large user bases can overwhelm conventional one-size-fits-all classification approaches. Although personalized models are better able to handle population diversity, they often require increased effort from the end user during training and are computationally expensive. In this paper, we propose an activity classification framework that is scalable and can tractably handle an increasing number of users. Scalability is achieved by maintaining distinct groups of similar users during the training process, which makes it possible to account for the differences between users without resorting to training individualized classifiers. The proposed framework keeps user burden low by leveraging crowd-sourced data labels, where simple natural language processing techniques in combination with multi-instance learning are used to handle labeling errors introduced by low-commitment everyday users. Experiment results on a large public dataset demonstrate that the framework can cope with population diversity irrespective of population size.
Far Out: Predicting Long-Term Human Mobility
Sadilek, Adam (University of Rochester) | Krumm, John (Microsoft Research)
Much work has been done on predicting where is one going to be in the immediate future, typically within the next hour. By contrast, we address the open problem of predicting human mobility far into the future, a scale of months and years. We propose an efficient nonparametric method that extracts significant and robust patterns in location data, learns their associations with contextual features (such as day of week), and subsequently leverages this information to predict the most likely location at any given time in the future. The entire process is formulated in a principled way as an eigendecomposition problem. Evaluation on a massive dataset with more than 32,000 days worth of GPS data across 703 diverse subjects shows that our model predicts the correct location with high accuracy, even years into the future. This result opens a number of interesting avenues for future research and applications.
Content Recommendation for Attention Management in Unified Social Messaging
With the growing popularity of social networks and collaboration systems, people are increasingly working with or socially connected with each other. Unified messaging system provides a single interface for users to receive and process information from multiple sources. It is highly desirable to design attention management solution that can help users easily navigate and process dozens of unread messages from a unified message system. Moreover, with the proliferation of mobile devices people are now selectively consuming the most important messages on the go between different activities in their daily life. The information overload problem is especially acute for mobile users with small screen to display. In this paper, we present \PAM, an intelligent end-to-end Personalized Attention Management solution that employs analytical techniques that can learn user interests and organize and prioritize incoming messages based on user interests. For a list of unread messages, \PAM generates a concise attention report that allows users to quickly scan the important new messages from his important social connections as well as messages about his most important tasks that the user is involved with. Our solution can also be applied in other applications such as news filtering and alerts on mobile devices. Our evaluation results demonstrate the effectiveness of \PAM.
Table Header Detection and Classification
Fang, Jing (Peking University) | Mitra, Prasenjit (The Pennsylvania State University) | Tang, Zhi (Peking University) | Giles, C. Lee (The Pennsylvania State University)
In digital libraries, a table, as a specific document component as well as a condensed way to present structured and relational data, contains rich information and often the only source of .that information. In order to explore, retrieve, and reuse that data, tables should be identified and the data extracted. Table recognition is an old field of research. However, due to the diversity of table styles, the results are still far from satisfactory, and not a single algorithm performs well on all different types of tables. In this paper, we randomly take samples from the CiteSeerX to investigate diverse table styles for automatic table extraction. We find that table headers are one of the main characteristics of complex table styles. We identify a set of features that can be used to segregate headers from tabular data and build a classifier to detect table headers. Our empirical evaluation on PDF documents shows that using a Random Forest classifier achieves an accuracy of 92%.
Pre-Symptomatic Prediction of Plant Drought Stress Using Dirichlet-Aggregation Regression on Hyperspectral Images
Kersting, Kristian (Fraunhofer IAIS and University of Bonn) | Xu, Zhao (Fraunhofer IAIS) | Wahabzada, Mirwaes (Fraunhofer IAIS) | Bauckhage, Christian (Fraunhofer IAIS and University of Bonn) | Thurau, Christian (Game Analytics ApS) | Römer, Christoph (University of Bonn) | Ballvora, Agim (University of Bonn) | Rascher, Uwe (Forschungszentrum Juelich) | Leon, Jen (University of Bonn) | Plümer, Lutz (Univeriy of Bonn)
Pre-symptomatic drought stress prediction is of great relevance in precision plant protection, ultimately helping to meet the challenge of "How to feed a hungry world?". Unfortunately, it also presents unique computational problems in scale and interpretability: it is a temporal, large-scale prediction task, e.g., when monitoring plants over time using hyperspectral imaging, and features are `things' with a `biological' meaning and interpretation and not just mathematical abstractions computable for any data. In this paper we propose Dirichlet-aggregation regression (DAR) to meet the challenge. DAR represents all data by means of convex combinations of only few extreme ones computable in linear time and easy to interpret.Then, it puts a Gaussian process prior on the Dirichlet distributions induced on the simplex spanned by the extremes. The prior can be a function of any observed meta feature such as time, location, type of fertilization, and plant species. We evaluated DAR on two hyperspectral image series of plants over time with about 2 (resp. 5.8) Billion matrix entries. The results demonstrate that DAR can be learned efficiently and predicts stress well before it becomes visible to the human eye.
Fine-Grained Photovoltaic Output Prediction Using a Bayesian Ensemble
Chakraborty, Prithwish (Virginia Tech) | Marwah, Manish (HP Labs) | Arlitt, Martin (HP Labs) | Ramakrishnan, Naren ( Virginia Tech )
Local and distributed power generation is increasingly relianton renewable power sources, e.g., solar (photovoltaic or PV) andwind energy. The integration of such sources into the power grid ischallenging, however, due to their variable and intermittent energyoutput. To effectively use them on alarge scale, it is essential to be able to predict power generation at afine-grained level. We describe a novel Bayesian ensemble methodologyinvolving three diverse predictors. Each predictor estimates mixingcoefficients for integrating PV generation output profiles but capturesfundamentally different characteristics. Two of them employ classicalparameterized (naive Bayes) and non-parametric (nearest neighbor) methods tomodel the relationship between weather forecasts and PV output. The thirdpredictor captures the sequentiality implicit in PV generation and uses motifsmined from historical data to estimate the most likely mixture weights usinga stream prediction methodology. We demonstrate the success and superiority of ourmethods on real PV data from two locations that exhibit diverse weatherconditions. Predictions from our model can be harnessed to optimize schedulingof delay tolerant workloads, e.g., in a data center.
Using Expectations to Drive Cognitive Behavior
Kurup, Unmesh (Carnegie Mellon University) | Lebiere, Christian (Carnegie Mellon University) | Stentz, Anthony (Carnegie Mellon University) | Hebert, Martial (Carnegie Mellon University)
Generating future states of the world is an essential component of high-level cognitive tasks such as planning. We explore the notion that such future-state generation is more widespread and forms an integral part of cognition. We call these generated states expectations, and propose that cognitive systems constantly generate expectations, match them to observed behavior and react when a difference exists between the two. We describe an ACT-R model that performs expectation-driven cognition on two tasks – pedestrian tracking and behavior classification. The model generates expectations of pedestrian movements to track them. The model also uses differences in expectations to identify distinctive features that differentiate these tracks. During learning, the model learns the association between these features and the various behaviors. During testing, it classifies pedestrian tracks by recalling the behavior associated with the features of each track. We tested the model on both single and multiple behavior datasets and compared the results against a k-NN classifier. The k-NN classifier outperformed the model in correct classifications, but the model had fewer incorrect classifications in the multiple behavior case, and both systems had about equal incorrect classifications in the single behavior case.