Performance Analysis
Detecting and Tracking Concept Class Drift and Emergence in Non-Stationary Fast Data Streams
Parker, Brandon Shane (University of Texas at Dallas) | Khan, Latifur (University of Texas at Dallas)
As the proliferation of constant data feeds increases from social media, embedded sensors, and other sources, the capability to provide predictive concept labels to these data streams will become ever more important and lucrative. However, the dynamic, non-stationary nature, and effectively infinite length of data streams pose additional challenges for stream data mining algorithms. The sparse quantity of training data also limits the use of algorithms that are heavily dependent on supervised training. To address all these issues, we propose an incremental semi-supervised method that provides general concept class label predictions, but it also tracks concept clusters within the feature space using an innovative new online clustering algorithm. Each concept cluster contains an embedded stream classifier, creating a diverse ensemble for data instance classification within the generative model used for detecting emerging concepts in the stream. Unlike other recent novel class detection methods, our method goes beyond detecting, and continues to differentiate and track the emerging concepts. We show the effectiveness of our method on several synthetic and real world data sets, and we compare the results against other leading baseline methods.
Learning Relational Sum-Product Networks
Nath, Aniruddh (University of Washington) | Domingos, Pedro M. (University of Washington)
Sum-product networks (SPNs) are a recently-proposed deep architecture that guarantees tractable inference, even on certain high-treewidth models. SPNs are a propositional architecture, treating the instances as independent and identically distributed. In this paper, we introduce Relational Sum-Product Networks (RSPNs), a new tractable first-order probabilistic architecture. RSPNs generalize SPNs by modeling a set of instances jointly, allowing them to influence each other's probability distributions, as well as modeling probabilities of relations between objects. We also present LearnRSPN, the first algorithm for learning high-treewidth tractable statistical relational models. LearnRSPN is a recursive top-down structure learning algorithm for RSPNs, based on Gens and Domingos' LearnSPN algorithm for propositional SPN learning. We evaluate the algorithm on three datasets; the RSPN learning algorithm outperforms Markov Logic Networks in both running time and predictive accuracy.
Kernelized Online Imbalanced Learning with Fixed Budgets
Hu, Junjie (The Chinese University of Hong Kong) | Yang, Haiqin (The Chinese University of Hong Kong) | King, Irwin (The Chinese University of Hong Kong) | Lyu, Michael R. (The Chinese University of Hong Kong) | So, Anthony Man-Cho (The Chinese University of Hong Kong)
Online learning from imbalanced streaming data to capture the nonlinearity and heterogeneity of the data is significant in machine learning and data mining. To tackle this problem, we propose a kernelized online imbalanced learning (KOIL) algorithm to directly maximize the area under the ROC curve (AUC). We address two more challenges: 1) How to control the number of support vectors without sacrificing model performance; and 2) how to restrict the fluctuation of the learned decision function to attain smooth updating. To this end, we introduce two buffers with fixed budgets (buffer sizes) for positive class and negative class, respectively, to store the learned support vectors, which can allow us to capture the global information of the decision boundary. When determining the weight of a new support vector, we confine its influence only to its $k$-nearest opposite support vectors. This can restrict the effect of new instances and prevent the harm of outliers. More importantly, we design a sophisticated scheme to compensate the model after replacement is conducted when either buffer is full. With this compensation, the learned model approaches the one learned with infinite budgets. We present both theoretical analysis and extensive experimental comparison to demonstrate the effectiveness of our proposed KOIL.
Collaborative Filtering with Localised Ranking
Dhanjal, Charanpal (Télécom ParisTech) | Gaudel, Romaric (University of Lille) | Clémençon, Stéphan (Télécom ParisTech)
In recommendation systems, one is interested in the ranking of the predicted items as opposed to other losses such as the mean squared error. Although a variety of ways to evaluate rankings exist in the literature, here we focus on the Area Under the ROC Curve (AUC) as it widely used and has a strong theoretical underpinning. In practical recommendation, only items at the top of the ranked list are presented to the users. With this in mind we propose a class of objective functions which primarily represent a smooth surrogate for the real AUC, and in a special case we show how to prioritise the top of the list. This loss is differentiable and is optimised through a carefully designed stochastic gradient-descent-based algorithm which scales linearly with the size of the data. We mitigate sample bias present in the data by sampling observations according to a certain power-law based distribution. In addition, we provide computation results as to the efficacy of the proposed method using synthetic and real data.
Extracting Adverse Drug Reactions from Social Media
Yates, Andrew (Georgetown University) | Goharian, Nazli (Georgetown University) | Frieder, Ophir (Georgetown University)
The potential benefits of mining social media to learn about adverse drug reactions (ADRs) are rapidly increasing with the increasing popularity of social media. Unknown ADRs have traditionally been discovered by expensive post-marketing trials, but recent work has suggested that some unknown ADRs may be discovered by analyzing social media. We propose three methods for extracting ADRs from forum posts and tweets, and compare our methods with several existing methods. Our methods outperform the existing methods in several scenarios; our filtering method achieves the highest F1 and precision on forum posts, and our CRF method achieves the highest precision on tweets. Furthermore, we address the difficulty of annotating social media on a large scale with an alternate evaluation scheme that takes advantage of the ADRs listed on drug labels. We investigate how well this alternate evaluation approximates a traditional evaluation using human annotations.
Dataless Text Classification with Descriptive LDA
Chen, Xingyuan (Leshan Normal University) | Xia, Yunqing (Tsinghua University) | Jin, Peng (Leshan Normal University) | Carroll, John (University of Sussex)
Manually labeling documents for training a text classifier is expensive and time-consuming. Moreover, a classifier trained on labeled documents may suffer from overfitting and adaptability problems. Dataless text classification (DLTC) has been proposed as a solution to these problems, since it does not require labeled documents. Previous research in DLTC has used explicit semantic analysis of Wikipedia content to measure semantic distance between documents, which is in turn used to classify test documents based on nearest neighbours. The semantic-based DLTC method has a major drawback in that it relies on a large-scale, finely-compiled semantic knowledge base, which is difficult to obtain in many scenarios. In this paper we propose a novel kind of model, descriptive LDA (DescLDA), which performs DLTC with only category description words and unlabeled documents. In DescLDA, the LDA model is assembled with a describing device to infer Dirichlet priors from prior descriptive documents created with category description words. The Dirichlet priors are then used by LDA to induce category-aware latent topics from unlabeled documents. Experimental results with the 20Newsgroups and RCV1 datasets show that: (1) our DLTC method is more effective than the semantic-based DLTC baseline method; and (2) the accuracy of our DLTC method is very close to state-of-the-art supervised text classification methods. As neither external knowledge resources nor labeled documents are required, our DLTC method is applicable to a wider range of scenarios.
Scalable and Interpretable Data Representation for High-Dimensional, Complex Data
Kim, Been (Massachusetts Institute of Technology) | Patel, Kayur (Google) | Rostamizadeh, Afshin (Google) | Shah, Julie (Massachusetts Institute of Technology)
The majority of machine learning research has been focused on building models and inference techniques with sound mathematical properties and cutting edge performance. Little attention has been devoted to the development of data representation that can be used to improve a user's ability to interpret the data and machine learning models to solve real-world problems. In this paper, we quantitatively and qualitatively evaluate an efficient, accurate and scalable feature-compression method using latent Dirichlet allocation for discrete data. This representation can effectively communicate the characteristics of high-dimensional, complex data points. We show that the improvement of a user's interpretability through the use of a topic modeling-based compression technique is statistically significant, according to a number of metrics, when compared with other representations. Also, we find that this representation is scalable --- it maintains alignment with human classification accuracy as an increasing number of data points are shown. In addition, the learned topic layer can semantically deliver meaningful information to users that could potentially aid human reasoning about data characteristics in connection with compressed topic space.
How Many Diagnoses Do We Need?
Stern, Roni Tzvi (Ben Gurion University of the Negev) | Kalech, Meir (Ben Gurion University of the Negev) | Rogov, Shelly (Ben Gurion University of the Negev) | Feldman, Alexander (PARC Inc.)
A known limitation of many diagnosis algorithms is that the number of diagnoses they return can be very large. This raises the question of how to use such a large set of diagnoses. For example, presenting hundreds of diagnoses to a human operator (charged with repairing the system) is meaningless. In various settings, including decision support for a human operator and automated troubleshooting processes, it is sufficient to be able to answer a basic diagnostic question: is a given component faulty? We propose a way to aggregate an arbitrarily large set of diagnoses to return an estimate of the likelihood of a given component to be faulty. The resulting mapping of components to their likelihood of being faulty is called the system's health state. We propose two metrics for evaluating the accuracy of a health state and show that an accurate health state can be found without finding all diagnoses. An empirical study explores the question of how many diagnoses are needed to obtain an accurate enough health state, and a simple online stopping criterion is proposed.
Solving and Explaining Analogy Questions Using Semantic Networks
Boteanu, Adrian (Worcester Polytechnic Institute) | Chernova, Sonia (Worcester Polytechnic Institute)
Analogies are a fundamental human reasoning pattern that relies on relational similarity. Understanding how analogies are formed facilitates the transfer of knowledge between contexts. The approach presented in this work focuses on obtaining precise interpretations of analogies. We leverage noisy semantic networks to answer and explain a wide spectrum of analogy questions. The core of our contribution, the Semantic Similarity Engine, consists of methods for extracting and comparing graph-contexts that reveal the relational parallelism that analogies are based on, while mitigating uncertainty in the semantic network. We demonstrate these methods in two tasks: answering multiple choice analogy questions and generating human readable analogy explanations. We evaluate our approach on two datasets totaling 600 analogy questions. Our results show reliable performance and low false-positive rate in question answering; human evaluators agreed with 96% of our analogy explanations.
Toward Mobile Robots Reasoning Like Humans
Oh, Jean H (Carnegie Mellon University) | Suppé, Arne (Carnegie Mellon University) | Duvallet, Felix (Carnegie Mellon University) | Boularias, Abdeslam (Carnegie Mellon University) | Navarro-Serment, Luis (Carnegie Mellon University) | Hebert, Martial (Carnegie Mellon University) | Stentz, Anthony (Carnegie Mellon University) | Vinokurov, Jerry (Carnegie Mellon University) | Romero, Oscar (Carnegie Mellon University) | Lebiere, Christian (Carnegie Mellon University) | Dean, Robert (General Dynamics Robotic Systems)
Robots are increasingly becoming key players in human-robot teams. To become effective teammates, robots must possess profound understanding of an environment, be able to reason about the desired commands and goals within a specific context, and be able to communicate with human teammates in a clear and natural way. To address these challenges, we have developed an intelligence architecture that combines cognitive components to carry out high-level cognitive tasks, semantic perception to label regions in the world, and a natural language component to reason about the command and its relationship to the objects in the world. This paper describes recent developments using this architecture on a fielded mobile robot platform operating in unknown urban environments. We report a summary of extensive outdoor experiments; the results suggest that a multidisciplinary approach to robotics has the potential to create competent human-robot teams.