weasel
Appendix
In this section we motivate the design choices and inductive biases that we encode into our neural encoder network e, which is the network that is used to model the relative accuracies of the weak supervision sources ฮป. Recall that we model the probability of a particular sample x X having the class label y Y = {1,...,C}as Pฮธ(y|ฮป) = softmax(s)yP(y), (4) s = ฮธ(ฮป,x)Tฮป RC . Connection to prior PGM models We now motivate this choice by deriving a less expressive variant of it from the standard Markov Random Field (MRF) used in the related work. If we view the attention scores ฮธ(ฮป,x) Rm, that assign sample-dependent accuracies to each labeling function, as sample-independent parameters ฮธ1 and, by that, drop the features from the equation - as is done in the related work [30, 32, 19, 11] - we can rewrite Eq. 4 as exp ฮธT1 1 {ฮป = y} P We can recognize Pฮธ as a distribution from the exponential familiy, and more specifically as a pairwise MRF, or factor graph, with canonical parameters ฮธ = (ฮธ1,ฮธ2) and corresponding sufficient statistics, or factors, ฯ(ฮป,y) = (ฯ1(ฮป,y),ฯ2(ฮป)), as well as the log partition function Zฮธ. The accuracy factors and parameters ฯ1,ฮธ1 are the core component of this model and sometimes take the form ฯ1(ฮปy) = ฮปy in binary models as in [30, 19, 11]. The label-independent factors ฯ2(ฮป) have, as can be seen from the derivation above, no direct influence on the latent label posterior, but are often used to model labeling propensities 1 {ฮป 6= 0}and correlation dependencies 1 {ฮปi = ฮปj}, which can be important for PGM parameter learning, but are susceptible to misspecifications [39, 11, 8].
Appendix A Posterior Reparameterization
In this section we motivate the design choices and inductive biases that we encode into our neural encoder network e, which is the network that is used to model the relative accuracies of the weak supervision sources ฮป. Recall that we model the probability of a particular sample x X having the class label y Y = {1,..., C} as P Our own parameterization therefore is a more expressive variant of these latent-variable PGM models, where we are able to assign LF accuracies on a sample-by-sample basis. Furthermore, our neural encoder network outputs them as a function of the LF outputs and features, and is expected to learn the easy to misspecify dependencies and label-independent statistics implicitly. The top 2 performance scores are highlighted as First, Second. Triplet-median [11] is not listed as it only converged for IMDB with 12 LFs (F1 = 73.0
End-to-End Weak Supervision Carnegie Mellon University 2
Aggregating multiple sources of weak supervision (WS) can ease the data-labeling bottleneck prevalent in many machine learning applications, by replacing the tedious manual collection of ground truth labels. Current state of the art approaches that do not use any labeled training data, however, require two separate modeling steps: Learning a probabilistic latent variable model based on the WS sources - making assumptions that rarely hold in practice - followed by downstream model training. Importantly, the first step of modeling does not consider the performance of the downstream model. To address these caveats we propose an end-to-end approach for directly learning the downstream model by maximizing its agreement with probabilistic labels generated by reparameterizing prior probabilistic posteriors with a neural network. Our results show improved performance over prior work in terms of end model performance on downstream test sets, as well as in terms of improved robustness to dependencies among weak supervision sources.
FiTs: Fine-grained Two-stage Training for Knowledge-aware Question Answering
Ye, Qichen, Cao, Bowen, Chen, Nuo, Xu, Weiyuan, Zou, Yuexian
Knowledge-aware question answering (KAQA) requires the model to answer questions over a knowledge base, which is essential for both open-domain QA and domain-specific QA, especially when language models alone cannot provide all the knowledge needed. Despite the promising result of recent KAQA systems which tend to integrate linguistic knowledge from pre-trained language models (PLM) and factual knowledge from knowledge graphs (KG) to answer complex questions, a bottleneck exists in effectively fusing the representations from PLMs and KGs because of (i) the semantic and distributional gaps between them, and (ii) the difficulties in joint reasoning over the provided knowledge from both modalities. To address the above two problems, we propose a Fine-grained Two-stage training framework (FiTs) to boost the KAQA system performance: The first stage aims at aligning representations from the PLM and the KG, thus bridging the modality gaps between them, named knowledge adaptive post-training. The second stage, called knowledge-aware fine-tuning, aims to improve the model's joint reasoning ability based on the aligned representations. In detail, we fine-tune the post-trained model via two auxiliary self-supervised tasks in addition to the QA supervision. Extensive experiments demonstrate that our approach achieves state-of-the-art performance on three benchmarks in the commonsense reasoning (i.e., CommonsenseQA, OpenbookQA) and medical question answering (i.e., MedQA-USMILE) domains.
Autonomous Swarming AI Munitions for USAF
The US Air Force's AFLCMC Armament Directorate has awarded a one-year contract to Liteye Systems and Unmanned Experts to build Web Weasels (WW) autonomous swarming artificially intelligent munitions. WW is part of Unmanned Experts' parent program Air Commons โ Swarm which allows commanders to plan, task, and manage multiple swarming assets through a Swarm ATO and Swarm Engine. According to Liteye, squadrons of autonomous collaborative munitions operating at range, and at risk, need the training, Tactics, Techniques and Procedures (TTPs) to handle the speed-of-datalink environment that occurs in modern combat. Teamwork, communication, shared mental models, and a robust set of tried and tested strategies are needed to survive and dominate. WW aims to overlay Artificial Intelligence and Machine Learning (AI/ML)-trained algorithms onto Air Commons โ Swarm's capabilities to provide pre-launch munitions with a series of TTPs in a'Playbook' for a given mission set (i.e., SEAD).
End-to-End Weak Supervision
Cachay, Salva Rรผhling, Boecking, Benedikt, Dubrawski, Artur
Aggregating multiple sources of weak supervision (WS) can ease the data-labeling bottleneck prevalent in many machine learning applications, by replacing the tedious manual collection of ground truth labels. Current state of the art approaches that do not use any labeled training data, however, require two separate modeling steps: Learning a probabilistic latent variable model based on the WS sources -- making assumptions that rarely hold in practice -- followed by downstream model training. Importantly, the first step of modeling does not consider the performance of the downstream model. To address these caveats we propose an end-to-end approach for directly learning the downstream model by maximizing its agreement with probabilistic labels generated by reparameterizing previous probabilistic posteriors with a neural network. Our results show improved performance over prior work in terms of end model performance on downstream test sets, as well as in terms of improved robustness to dependencies among weak supervision sources.
Plotting time: On the usage of CNNs for time series classification
Rodrigues, Nuno M., Batista, Joรฃo E., Trujillo, Leonardo, Duarte, Bernardo, Giacobini, Mario, Vanneschi, Leonardo, Silva, Sara
We present a novel approach for time series classification where we represent time series data as plot images and feed them to a simple CNN, outperforming several state-of-the-art methods. We propose a simple and highly replicable way of plotting the time series, and feed these images as input to a non-optimized shallow CNN, without any normalization or residual connections. These representations are no more than default line plots using the time series data, where the only pre-processing applied is to reduce the number of white pixels in the image. We compare our method with different state-of-the-art methods specialized in time series classification on two real-world non public datasets, as well as 98 datasets of the UCR dataset collection. The results show that our approach is very promising, achieving the best results on both real-world datasets and matching / beating the best state-of-the-art methods in six UCR datasets. We argue that, if a simple naive design like ours can obtain such good results, it is worth further exploring the capabilities of using image representation of time series data, along with more powerful CNNs, for classification and other related tasks.
Interpretable Time Series Classification using Linear Models and Multi-resolution Multi-domain Symbolic Representations
Nguyen, Thach Le, Gsponer, Severin, Ilie, Iulia, O'Reilly, Martin, Ifrim, Georgiana
The time series classification literature has expanded rapidly over the last decade, with many new classification approaches published each year. Prior research has mostly focused on improving the accuracy and efficiency of classifiers, with interpretability being somewhat neglected. This aspect of classifiers has become critical for many application domains and the introduction of the EU GDPR legislation in 2018 is likely to further emphasize the importance of interpretable learning algorithms. Currently, state-of-the-art classification accuracy is achieved with very complex models based on large ensembles (COTE) or deep neural networks (FCN). These approaches are not efficient with regard to either time or space, are difficult to interpret and cannot be applied to variable-length time series, requiring pre-processing of the original series to a set fixed-length. In this paper we propose new time series classification algorithms to address these gaps. Our approach is based on symbolic representations of time series, efficient sequence mining algorithms and linear classification models. Our linear models are as accurate as deep learning models but are more efficient regarding running time and memory, can work with variable-length time series and can be interpreted by highlighting the discriminative symbolic features on the original time series. We show that our multi-resolution multi-domain linear classifier (mtSS-SEQL+LR) achieves a similar accuracy to the state-of-the-art COTE ensemble, and to recent deep learning methods (FCN, ResNet), but uses a fraction of the time and memory required by either COTE or deep models. To further analyse the interpretability of our classifier, we present a case study on a human motion dataset collected by the authors. We release all the results, source code and data to encourage reproducibility.
A tale of two toolkits, report the second: bake off redux. Chapter 1. dictionary based classifiers
Bagnall, Anthony, Large, James, Middlehurst, Matthew
Time series classification (TSC) is the problem of learning labels from time dependent data. One class of algorithms is derived from a bag of words approach. A window is run along a series, the subseries is shortened and discretised to form a word, then features are formed from the histogram of frequency of occurrence of words. We call this type of approach to TSC dictionary based classification. We compare four dictionary based algorithms in the context of a wider project to update the great time series classification bakeoff, a comparative study published in 2017. We experimentally characterise the algorithms in terms of predictive performance, time complexity and space complexity. We find that we can improve on the previous best in terms of accuracy, but this comes at the cost of time and space. Alternatively, the same performance can be achieved with far less cost. We review the relative merits of the four algorithms before suggesting a path to possible improvement.
Scalable Dictionary Classifiers for Time Series Classification
Middlehurst, Matthew, Vickers, William, Bagnall, Anthony
Dictionary based classifiers are a family of algorithms for time series classification (TSC), that focus on capturing the frequency of pattern occurrences in a time series. The ensemble based Bag of Symbolic Fourier Approximation Symbols (BOSS) was found to be a top performing TSC algorithm in a recent evaluation, as well as the best performing dictionary based classifier. A recent addition to the category, the Word Extraction for Time Series Classification (WEASEL), claims an improvement on this performance. Both of these algorithms however have non-trivial scalability issues, taking a considerable amount of build time and space on larger datasets. We evaluate changes to the way BOSS chooses classifiers for its ensemble, replacing its parameter search with random selection. This change allows for the easy implementation of contracting, setting a build time limit for the classifier and check-pointing, saving progress during the classifiers build. To differentiate between the two BOSS ensemble methods we refer to our randomised version as RBOSS. Additionally we test the application of common ensembling techniques to help retain accuracy from the loss of the BOSS parameter search. We achieve a significant reduction in build time without a significant change in accuracy on average when compared to BOSS by creating a size $n$ weighted ensemble selecting the best performers from $k$ randomly chosen parameter sets. Our experiments are conducted on datasets from the recently expanded UCR time series archive. We demonstrate the usability improvements to RBOSS with a case study using a large whale acoustics dataset for which BOSS proved infeasible.