Performance Analysis
A Trust Prediction Approach Capturing Agents' Dynamic Behavior
Liu, Xin (Nanyang Technological University) | Datta, Anwitaman (Nanyang Technological University)
Predicting trust among the agents is of great importance to various open distributed settings (e.g., e-market, peer-to-peer networks, etc.) in that dishonest agents can easily join the system and achieve their goals by circumventing agreed rules, or gaining unfair advantages, etc. Most existing trust mechanisms derive trust by statistically investigating the target agent's historical information. However, even if rich historical information is available, it is challenging to model an agent's behavior since an intelligent agent may strategically change its behavior to maximize its profits. We therefore propose a trust prediction approach to capture dynamic behavior of the target agent. Specifically, we first identify features which are capable of describing/representing context of a transaction. Then we use these features to measure similarity between context of the potential transaction and that of previous transactions to estimate trustworthiness of the potential transaction based on previous similar transactions' outcomes. Evaluation using real auction data and synthetic data demonstrates efficacy of our approach in comparison with an existing representative trust mechanism.
Aesthetic Guideline Driven Photography by Robots
Gadde, Raghudeep (International Institute of Information Technology - Hyderabad) | Karlapalem, Kamalakar (International Institute of Information Technology - Hyderabad)
Robots depend on captured images for perceiving the environment. A robot can replace a human in capturing quality photographs for publishing. In this paper, we employ an iterative photo capture by robots (by repositioning itself) to capture good quality photographs. Our image quality assessment approach is based on few high level features of the image combined with some of the aesthetic guidelines of professional photography. Our system can also be used in web image search applications to rank images. We test our quality assessment approach on a large and diversified dataset and our system is able to achieve a classification accuracy of 79%. We assess the aesthetic error in the captured image and estimate the change required in orientation of the robot to retake an aesthetically better photograph. Our experiments are conducted on NAO robot with no stereo vision. The results demonstrate that our system can be used to capture professional photographs which are in accord with the human professional photography.
Semi-Supervised Learning for Imbalanced Sentiment Classification
Li, Shoushan (Soochow University) | Wang, Zhongqing (Soochow University) | Zhou, Guodong (Soochow University) | Lee, Sophia Yat Mei (The Hong Kong Polytechnic University)
Trained on the imbalanced labeled data, most classification Various semi-supervised learning methods have algorithms tend to predict test samples as the majority class been proposed recently to solve the longstanding and may ignore the minority class. Although many methods, shortage problem of manually labeled data in sentiment such as re-sampling [Chawla et al., 2002], one-class classification classification. However, most existing studies [Juszczak and Duin, 2003], and cost-sensitive assume the balance between negative and positive learning [Zhou and Liu, 2006], have been proposed to solve samples in both the labeled and unlabeled data, this issue, it is still unclear as to which method is more which may not be true in reality. In this paper, we suitable to handle the imbalanced problem in sentiment investigate a more common case of semi-supervised classification and whether the method is extendable to learning for imbalanced sentiment classification.
Using Multiple Models to Understand Data
Patel, Kayur (University of Washington) | Drucker, Steven M. (Microsoft Research) | Fogarty, James (University of Washington) | Kapoor, Ashish (Microsoft Research) | Tan, Desney S. (Microsoft Research)
In our first experiment, we show that using A human's ability to diagnose errors, gather data, multiple models to identify potential label noise can provide and generate features in order to build better a threefold reduction in the number of spurious examples a models is largely untapped. We hypothesize that practitioner examines. In our second experiment, we show analyzing results from multiple models can help that analyses of multiple models can identify examples that people diagnose errors by understanding are significantly more likely to respond to additional relationships among data, features, and algorithms.
Active Online Classification Via Information Maximization
Slonim, Noam (IBM Haifa Research Lab) | Yom-Tov, Elad (IBM Haifa Research Lab) | Crammer, Koby (The Technion)
We propose an online classification approach for co-occurrence data which is based on a simple information theoretic principle. We further show how to properly estimate the uncertainty associated with each prediction of our scheme and demonstrate how to exploit these uncertainty estimates. First, in order to abstain highly uncertain predictions. And second, within an active learning framework, in order to preserve classification accuracy while substantially reducing training set size. Our method is highly efficient in terms of run-time and memory footprint requirements. Experimental results in the domain of text classification demonstrate that the classification accuracy of our method is superior or comparable to other state-of-the-art online classification algorithms.
Consistency Measures for Feature Selection: A Formal Definition, Relative Sensitivity Comparison and a Fast Algorithm
Shin, Kilho (University of Hyogo) | Fernandes, Danny (University of Hyogo) | Miyazaki, Seiya (Panasonic Corporation)
Consistency-based feature selection is an important category of feature selection research yet is defined only intuitively in the literature. First, we formally define a consistency measure, and then using this definition, evaluate 19 feature selection measures from the literature. While only 5 of these were labeledas consistency measures by their original authors, by our definition, an additional 9 measures should be classified as consistency measures. To compare these 14 consistency measures in terms of sensitivity, we introduce the concept of quasilinear compatibility order, and partially determine the order among the measures. Next, we proposea new fast algorithm for consistency-based feature selection. We ran experiments using eleven large datasets to compare the performance of our algorithm against INTERACT and LCC, the only two instances of consistency-based algorithms with potential real world application. Our algorithm shows vast improvement in time efficiency, while its performance in accuracy is comparable with that of INTERACT and LCC.
Classification of Emerging Extreme Event Tracks in Multivariate Spatio-Temporal Physical Systems Using Dynamic Network Structures: Application to Hurricane Track Prediction
Sencan, Huseyin (North Carolina State University) | Chen, Zhengzhang (North Carolina State University) | Hendrix, William (Northwestern University) | Pansombut, Tatdow (North Carolina State University) | Semazzi, Frederick (North Carolina State University) | Choudhary, Alok (North Carolina State University) | Kumar, Vipin (University of Minnesota) | Melechko, Anatoli V. (North Carolina State University) | Samatova, Nagiza F. (Oak Ridge National Laboratory)
Understanding extreme events, such as hurricanes or forest fires, is of paramount importance because of their adverse impacts on human beings. Such events often propagate in space and time. Predicting-even a few days in advance-what locations will get affected by the event tracks could benefit our society in many ways. Arguably, simulations from โfirst principles,โ where underlying physics-based models are described by a system of equations, provide least reliable predictions for variables characterizing the dynamics of these extreme events. Data-driven model building has been recently emerging as a complementary approach that could learn the relationships between historically observed or simulated multiple, spatio-temporal ancillary variables and the dynamic behavior of extreme events of interest. While promising, the methodology for predictive learning from such complex data is still in its infancy. In this paper, we propose a dynamic networks-based methodology for in-advance prediction of the dynamic tracks of emerging extreme events. By associating a network model of the system with the known tracks, our method is capable of learning the recurrent network motifs that could be used as discriminatory signatures for the event's behavioral class. When applied to classifying the behavior of the hurricane tracks at their early formation stages in Western Africa region, our method is able to predict whether hurricane tracks will hit the land of the North Atlantic region at least 10-15 days lead lag time in advance with more than 90%ย accuracy using 10-fold cross-validation. To the best of our knowledge, no comparable methodology exists for solving this problem using data-driven models.
Discovering Deformable Motifs in Continuous Time Series Data
Saria, Suchi (Stanford University) | Duchi, Andrew (Stanford University) | Koller, Daphne (Stanford University)
Continuous time series data often comprise or contain repeated motifs โ patterns that have similar shape, and yet exhibit nontrivial variability. Identifying these motifs, even in the presence of variation, is an important subtask in both unsupervised knowledge discovery and constructing useful features for discriminative tasks. This paper addresses this task using a probabilistic framework that models generation of data as switching between a random walk state and states that generate motifs. A motif is generated from a continuous shape template that can undergo non-linear transformations such as temporal warping and additive noise. We propose an unsupervised algorithm that simultaneously discovers both the set of canonical shape templates and a template-specific model of variability manifested in the data. Experimental results on three real-world data sets demonstrate that our model is able to recover templates in data where repeated instances show large variability. The recovered templates provide higher classification accuracy and coverage when compared to those from alternatives such as random projection based methods and simpler generative models that do not model variability. Moreover, in analyzing physiological signals from infants in the ICU, we discover both known signatures as well as novel physiomarkers.
Domain Adaptation with Ensemble of Feature Groups
Samdani, Rajhans Yih (University of Illinois at Urbana-Champaign) | Yih, Wen-tau (Microsoft Research)
We present a novel approach for domain adaptation based on feature grouping and re-weighting. Our algorithm operates by creating an ensemble of multiple classifiers, where each classifier is trained on one particular feature group. Faced with the distribution change involved in domain change, different feature groups exhibit different cross-domain prediction abilities. Herein, ensemble models provide us the flexibility of tuning the weights of corresponding classifiers in order to adapt to the new domain. Our approach is supported by a solid theoretical analysis based on the expressiveness of ensemble classifiers, which allows trading-off errors across source and target domains. Moreover, experimental results on sentiment classification and spam detection show that our approach not only outperforms the baseline method, but is also superior to other state-of-the-art methods.
Concept Labeling: Building Text Classifiers with Minimal Supervision
Chenthamarakshan, Vijil (IBM T J Watson Research Center Yorktown Heights) | Melville, Prem (IBM T J Watson Research Center Yorktown Heights) | Sindhwani, Vikas (IBM T J Watson Research Center Yorktown Heights) | Lawrence, Richard D (IBM T J Watson Research Center Yorktown Heights)
The rapid construction of supervised text classification models is becoming a pervasive need across many modern applications. To reduce human-labeling bottlenecks, many new statistical paradigms (e.g., active, semi-supervised, transfer and multi-task learning) have been vigorously pursued in recent literature with varying degrees of empirical success. Concurrently, the emergence of Web 2.0 platforms in the last decade has enabled a world-wide, collaborative human effort to construct a massive ontology of concepts with very rich, detailed and accurate descriptions. In this paper we propose a new framework to extract supervisory information from such ontologies and complement it with a shift in human effort from direct labeling of examples in the domain of interest to the much more efficient identification of concept-class associations. Through empirical studies on text categorization problems using the Wikipedia ontology, we show that this shift allows very high-quality models to be immediately induced at virtually no cost.