Inductive Learning
Valid Explanations for Learning to Rank Models
Singh, Jaspreet, Wang, Zhenye, Khosla, Megha, Anand, Avishek
Learning-to-rank (LTR) is a class of supervised learning techniques that apply to ranking problems dealing with a large number of features. The popularity and widespread application of LTR models in prioritizing information in a variety of domains makes their scrutability vital in today's landscape of fair and transparent learning systems. However, limited work exists that deals with interpreting the decisions of learning systems that output rankings. In this paper we propose a model agnostic local explanation method that seeks to identify a small subset of input features as explanation to a ranking decision. We introduce new notions of validity and completeness of explanations specifically for rankings, based on the presence or absence of selected features, as a way of measuring goodness. We devise a novel optimization problem to maximize validity directly and propose greedy algorithms as solutions. In extensive quantitative experiments we show that our approach outperforms other model agnostic explanation approaches across pointwise, pairwise and listwise LTR models in validity while not compromising on completeness.
Mining Environment Assumptions for Cyber-Physical System Models
Mohammadinejad, Sara, Deshmukh, Jyotirmoy V., Puranic, Aniruddh G.
Many complex cyber-physical systems can be modeled as heterogeneous components interacting with each other in real-time. We assume that the correctness of each component can be specified as a requirement satisfied by the output signals produced by the component, and that such an output guarantee is expressed in a real-time temporal logic such as Signal Temporal Logic (STL). In this paper, we hypothesize that a large subset of input signals for which the corresponding output signals satisfy the output requirement can also be compactly described using an STL formula that we call the environment assumption. We propose an algorithm to mine such an environment assumption using a supervised learning technique. Essentially, our algorithm treats the environment assumption as a classifier that labels input signals as good if the corresponding output signal satisfies the output requirement, and as bad otherwise. Our learning method simultaneously learns the structure of the STL formula as well as the values of the numeric constants appearing in the formula. To achieve this, we combine a procedure to systematically enumerate candidate Parametric STL (PSTL) formulas, with a decision-tree based approach to learn parameter values. We demonstrate experimental results on real world data from several domains including transportation and health care.
Machine Learning for Exploring Spatial Affordance Patterns
This dissertation uses supervised and unsupervised data mining techniques to analyse office floor plans in an attempt to gain a better understanding of their geometry-to-function relationship. This question was deemed relevant after a background review of the state-of-the-art in automated floor-plan generation tools showed that such tools have been prototyped since the 1960s, but their search space is ill-informed because there are few formalisms to describe spatial affordance. To show and evaluate the relationship of geometry and use, data from visual graph analysis were used to train three supervised learners and compare these to a baseline accuracy established with a ZeroR classifier. This showed that for the office dataset examined, visual mean depth and integration are most tightly linked to usage and that the supervised learning algorithm J48 can correctly predict class performance on unseen examples to up to 79.5%. The thesis also includes an evaluation of the layout case studies with unsupervised learners, which showed that use could not be immediately reverse-engineered based solemnly on the VGA information to achieve a strong cluster-to-class evaluation.
Towards Real-Time and Unsupervised Campaign Detection in Social Media
Assenmacher, Dennis (University of Mรผnster ) | Adam, Lena (University of Mรผnster) | Trautmann, Heike (University of Mรผnster) | Grimme, Christian (University of Mรผnster)
The detection of orchestrated and potentially manipulative campaigns in social media is far more meaningful than analyzing single account behaviour but also more challenging in terms of pattern recognition, data processing, and computational complexity. While supervised learning methods need an enormous amount of reliable ground truth data to find rather inflexible patterns, classical unsupervised learning techniques need a lot of computational power to handle large amount of data. This makes them infeasible for real-time analysis. In this work, we demonstrate the applicability of text stream clustering for the real-time detection of coordinated campaigns.
Pre-Training A Neural Language Model Improves the Sample Efficiency of an Emergency Room Classification Model
Xu, Binbin (University of Bordeaux ) | Gil-Jardinรฉ, Cรฉdric (University Hospital of Bordeaux) | Thiessard, Frantz ( Universitรฉ de Bordeaux ) | Tellier, Eric (University Hospital of Bordeaux) | Avalos-Fernandez, Marta (Universitรฉ de Bordeaux) | Lagarde, Emmanuel (Universitรฉ de Bordeaux)
To build a French national electronic injury surveillance system based on emergency room visits, we aim to develop a coding system to classify their causes from clinical notes in free-text. Supervised learning techniques have shown good results in this area but require a large amount of expert annotated dataset which is time consuming and costly to obtain. We hypothesize that the Natural Language Processing Transformer model incorporating a generative self-supervised pre-training step can significantly reduce the required number of annotated samples for supervised fine-tuning. In this preliminary study, we test our hypothesis in the simplified problem of predicting whether a visit is the consequence of a traumatic event or not from free-text clinical notes. Using fully re-trained GPT-2 models (without OpenAI pre-trained weights), we assess the gain of applying a self-supervised pre-training phase with unlabeled notes prior to the supervised learning task. Results show that the number of data required to achieve a ginve level of performance (AUC>0.95) was reduced by a factor of 10 when applying pre-training. Namely, for 16 times more data, the fully-supervised model achieved an improvement <1% in AUC. To conclude, it is possible to adapt a multi-purpose neural language model such as the GPT-2 to create a powerful tool for classification of free-text notes with only a small number of labeled samples.
Information-Theoretic Generalization Bounds for Meta-Learning and Applications
Jose, Sharu Theresa, Simeone, Osvaldo
Meta-learning, or "learning to learn", refers to techniques that infer an inductive bias from data corresponding to multiple related tasks with the goal of improving the sample efficiency for new, previously unobserved, tasks. A key performance measure for meta-learning is the meta-generalization gap, that is, the difference between the average loss measured on the meta-training data and on a new, randomly selected task. This paper presents novel information-theoretic upper bounds on the meta-generalization gap. Two broad classes of meta-learning algorithms are considered that uses either separate within-task training and test sets, like MAML, or joint within-task training and test sets, like Reptile. Extending the existing work for conventional learning, an upper bound on the meta-generalization gap is derived for the former class that depends on the mutual information (MI) between the output of the meta-learning algorithm and its input meta-training data. For the latter, the derived bound includes an additional MI between the output of the per-task learning procedure and corresponding data set to capture within-task uncertainty. Tighter bounds are then developed, under given technical conditions, for the two classes via novel Individual Task MI (ITMI) bounds. Applications of the derived bounds are finally discussed, including a broad class of noisy iterative algorithms for meta-learning.
[Links of the Day] 12/05/2020 : Learning From Unlabeled Data, Fast Dataset Classifier, Azure Bad Rollout guardian
Thang present a novel method for learning from unlabeled data and more specifically semi-supervised learning methods. These methods were used to generate Google Meena Chatbot model. Like Snorkel this is used to quickly building classifiers of datasets that would be otherwise extremely time-consuming (and expensive) to label by hand for training purposes. Gandalf: Azure machine learning system trained to catch bad rollout deployment. The aims of this system is to catch bad deployment before they can have ripple effects across the whole system.
Multi-Level Generative Models for Partial Label Learning with Non-random Label Noise
Partial label (PL) learning tackles the problem where each training instance is associated with a set of candidate labels that include both the true label and irrelevant noise labels. In this paper, we propose a novel multi-level generative model for partial label learning (MGPLL), which tackles the problem by learning both a label level adversarial generator and a feature level adversarial generator under a bi-directional mapping framework between the label vectors and the data samples. Specifically, MGPLL uses a conditional noise label generation network to model the non-random noise labels and perform label denoising, and uses a multi-class predictor to map the training instances to the denoised label vectors, while a conditional data feature generator is used to form an inverse mapping from the denoised label vectors to data samples. Both the noise label generator and the data feature generator are learned in an adversarial manner to match the observed candidate labels and data features respectively. Extensive experiments are conducted on synthesized and real-world partial label datasets. The proposed approach demonstrates the state-of-the-art performance for partial label learning.
Multi-Instance Multi-Label Learning for Gene Mutation Prediction in Hepatocellular Carcinoma
Xu, Kaixin, Zhao, Ziyuan, Gu, Jiapan, Zeng, Zeng, Ying, Chan Wan, Choon, Lim Kheng, Hua, Thng Choon, Chow, Pierce KH
Gene mutation prediction in hepatocellular carcinoma (HCC) is of great diagnostic and prognostic value for personalized treatments and precision medicine. In this paper, we tackle this problem with multi-instance multi-label learning to address the difficulties on label correlations, label representations, etc. Furthermore, an effective oversampling strategy is applied for data imbalance. Experimental results have shown the superiority of the proposed approach.
Towards Knowledgeable Supervised Lifelong Learning Systems
Benavides-Prado, Diana (The University of Auckland) | Koh, Yun Sing | Riddle, Patricia
Learning a sequence of tasks is a long-standing challenge in machine learning. This setting applies to learning systems that observe examples of a range of tasks at different points in time. A learning system should become more knowledgeable as more related tasks are learned. Although the problem of learning sequentially was acknowledged for the first time decades ago, the research in this area has been rather limited. Research in transfer learning, multitask learning, metalearning and deep learning has studied some challenges of these kinds of systems. Recent research in lifelong machine learning and continual learning has revived interest in this problem. We propose Proficiente, a full framework for long-term learning systems. Proficiente relies on knowledge transferred between hypotheses learned with Support Vector Machines. The first component of the framework is focused on transferring forward selectively from a set of existing hypotheses or functions representing knowledge acquired during previous tasks to a new target task. A second component of Proficiente is focused on transferring backward, a novel ability of long-term learning systems that aim to exploit knowledge derived from recent tasks to encourage refinement of existing knowledge. We propose a method that transfers selectively from a task learned recently to existing hypotheses representing previous tasks. The method encourages retention of existing knowledge whilst refining. We analyse the theoretical properties of the proposed framework. Proficiente is accompanied by an agnostic metric that can be used to determine if a long-term learning system is becoming more knowledgeable. We evaluate Proficiente in both synthetic and real-world datasets, and demonstrate scenarios where knowledgeable supervised learning systems can be achieved by means of transfer.