Goto

Collaborating Authors

 Inductive Learning


Multi-Modal Pre-Training for Automated Speech Recognition

arXiv.org Artificial Intelligence

Traditionally, research in automated speech recognition has focused on local-first encoding of audio representations to predict the spoken phonemes in an utterance. Unfortunately, approaches relying on such hyper-local information tend to be vulnerable to both local-level corruption (such as audio-frame drops, or loud noises) and global-level noise (such as environmental noise, or background noise) that has not been seen during training. In this work, we introduce a novel approach which leverages a self-supervised learning technique based on masked language modeling to compute a global, multi-modal encoding of the environment in which the utterance occurs. We then use a new deep-fusion framework to integrate this global context into a traditional ASR method, and demonstrate that the resulting method can outperform baseline methods by up to 7% on Librispeech; gains on internal datasets range from 6% (on larger models) to 45% (on smaller models).


Rethinking Data Augmentation for Robust Visual Question Answering

arXiv.org Artificial Intelligence

Data Augmentation (DA) -- generating extra training samples beyond original training set -- has been widely-used in today's unbiased VQA models to mitigate the language biases. Current mainstream DA strategies are synthetic-based methods, which synthesize new samples by either editing some visual regions/words, or re-generating them from scratch. However, these synthetic samples are always unnatural and error-prone. To avoid this issue, a recent DA work composes new augmented samples by randomly pairing pristine images and other human-written questions. Unfortunately, to guarantee augmented samples have reasonable ground-truth answers, they manually design a set of heuristic rules for several question types, which extremely limits its generalization abilities. To this end, we propose a new Knowledge Distillation based Data Augmentation for VQA, dubbed KDDAug. Specifically, we first relax the requirements of reasonable image-question pairs, which can be easily applied to any question types. Then, we design a knowledge distillation (KD) based answer assignment to generate pseudo answers for all composed image-question pairs, which are robust to both in-domain and out-of-distribution settings. Since KDDAug is a model-agnostic DA strategy, it can be seamlessly incorporated into any VQA architectures. Extensive ablation studies on multiple backbones and benchmarks have demonstrated the effectiveness and generalization abilities of KDDAug.


Landmark-free Statistical Shape Modeling via Neural Flow Deformations

arXiv.org Artificial Intelligence

Statistical shape modeling aims at capturing shape variations of an anatomical structure that occur within a given population. Shape models are employed in many tasks, such as shape reconstruction and image segmentation, but also shape generation and classification. Existing shape priors either require dense correspondence between training examples or lack robustness and topological guarantees. We present FlowSSM, a novel shape modeling approach that learns shape variability without requiring dense correspondence between training instances. It relies on a hierarchy of continuous deformation flows, which are parametrized by a neural network. Our model outperforms state-of-the-art methods in providing an expressive and robust shape prior for distal femur and liver. We show that the emerging latent representation is discriminative by separating healthy from pathological shapes. Ultimately, we demonstrate its effectiveness on two shape reconstruction tasks from partial data. Our source code is publicly available (https://github.com/davecasp/flowssm).


Inductive Knowledge Graph Reasoning for Multi-batch Emerging Entities

arXiv.org Artificial Intelligence

Over the years, reasoning over knowledge graphs (KGs), which aims to infer new conclusions from known facts, has mostly focused on static KGs. The unceasing growth of knowledge in real life raises the necessity to enable the inductive reasoning ability on expanding KGs. Existing inductive work assumes that new entities all emerge once in a batch, which oversimplifies the real scenario that new entities continually appear. This study dives into a more realistic and challenging setting where new entities emerge in multiple batches. We propose a walk-based inductive reasoning model to tackle the new setting. Specifically, a graph convolutional network with adaptive relation aggregation is designed to encode and update entities using their neighboring relations. To capture the varying neighbor importance, we employ a query-aware feedback attention mechanism during the aggregation. Furthermore, to alleviate the sparse link problem of new entities, we propose a link augmentation strategy to add trustworthy facts into KGs. We construct three new datasets for simulating this multi-batch emergence scenario. The experimental results show that our proposed model outperforms state-of-the-art embedding-based, walk-based and rule-based models on inductive KG reasoning.


Design of Negative Sampling Strategies for Distantly Supervised Skill Extraction

arXiv.org Artificial Intelligence

Skills play a central role in the job market and many human resources (HR) processes. In the wake of other digital experiences, today's online job market has candidates expecting to see the right opportunities based on their skill set. Similarly, enterprises increasingly need to use data to guarantee that the skills within their workforce remain future-proof. However, structured information about skills is often missing, and processes building on self- or manager-assessment have shown to struggle with issues around adoption, completeness, and freshness of the resulting data. Extracting skills is a highly challenging task, given the many thousands of possible skill labels mentioned either explicitly or merely described implicitly and the lack of finely annotated training corpora. Previous work on skill extraction overly simplifies the task to an explicit entity detection task or builds on manually annotated training data that would be infeasible if applied to a complete vocabulary of skills. We propose an end-to-end system for skill extraction, based on distant supervision through literal matching. We propose and evaluate several negative sampling strategies, tuned on a small validation dataset, to improve the generalization of skill extraction towards implicitly mentioned skills, despite the lack of such implicit skills in the distantly supervised data. We observe that using the ESCO taxonomy to select negative examples from related skills yields the biggest improvements, and combining three different strategies in one model further increases the performance, up to 8 percentage points in RP@5. We introduce a manually annotated evaluation benchmark for skill extraction based on the ESCO taxonomy, on which we validate our models. We release the benchmark dataset for research purposes to stimulate further research on the task.


MedDistant19: Towards an Accurate Benchmark for Broad-Coverage Biomedical Relation Extraction

arXiv.org Artificial Intelligence

Relation extraction in the biomedical domain is challenging due to the lack of labeled data and high annotation costs, needing domain experts. Distant supervision is commonly used to tackle the scarcity of annotated data by automatically pairing knowledge graph relationships with raw texts. Such a pipeline is prone to noise and has added challenges to scale for covering a large number of biomedical concepts. We investigated existing broad-coverage distantly supervised biomedical relation extraction benchmarks and found a significant overlap between training and test relationships ranging from 26% to 86%. Furthermore, we noticed several inconsistencies in the data construction process of these benchmarks, and where there is no train-test leakage, the focus is on interactions between narrower entity types. This work presents a more accurate benchmark MedDistant19 for broad-coverage distantly supervised biomedical relation extraction that addresses these shortcomings and is obtained by aligning the MEDLINE abstracts with the widely used SNOMED Clinical Terms knowledge base. Lacking thorough evaluation with domain-specific language models, we also conduct experiments validating general domain relation extraction findings to biomedical relation extraction.


Identifying epidemic related Tweets using noisy learning

arXiv.org Artificial Intelligence

Supervised learning algorithms are heavily reliant on annotated datasets to train machine learning models. However, the curation of the annotated datasets is laborious and time consuming due to the manual effort involved and has become a huge bottleneck in supervised learning. In this work, we apply the theory of noisy learning to generate weak supervision signals instead of manual annotation. We curate a noisy labeled dataset using a labeling heuristic to identify epidemic related tweets. We evaluated the performance using a large epidemic corpus and our results demonstrate that models trained with noisy data in a class imbalanced and multi-classification weak supervision setting achieved performance greater than 90%.


Self-supervised Learning for Heterogeneous Graph via Structure Information based on Metapath

arXiv.org Artificial Intelligence

graph neural networks (GNNs) are the dominant paradigm for modeling and handling graph structure data by learning universal node representation. The traditional way of training GNNs depends on a great many labeled data, which results in high requirements on cost and time. In some special scene, it is even unavailable and impracticable. Self-supervised representation learning, which can generate labels by graph structure data itself, is a potential approach to tackle this problem. And turning to research on self-supervised learning problem for heterogeneous graphs is more challenging than dealing with homogeneous graphs, also there are fewer studies about it. In this paper, we propose a SElfsupervised learning method for heterogeneous graph via Structure Information based on Metapath (SESIM). The proposed model can construct pretext tasks by predicting jump number between nodes in each metapath to improve the representation ability of primary task. In order to predict jump number, SESIM uses data itself to generate labels, avoiding time-consuming manual labeling. Moreover, predicting jump number in each metapath can effectively utilize graph structure information, which is the essential property between nodes. Therefore, SESIM deepens the understanding of models for graph structure. At last, we train primary task and pretext tasks jointly, and use meta-learning to balance the contribution of pretext tasks for primary task. Empirical results validate the performance of SESIM method and demonstrate that this method can improve the representation ability of traditional neural networks on link prediction task and node classification task.


Patient-specific modelling, simulation and real-time processing for respiratory diseases

arXiv.org Artificial Intelligence

Asthma is a common chronic disease of the respiratory system causing significant disability and societal burden. It affects more than 300 million people worldwide, while more than 100 million people will likely have asthma by 2025. The price of asthma varies greatly from nation to nation. Mean yearly cost can be estimated to 1900 EUR in Europe and $3100 in the United States. Managing asthma involves controlling symptoms, preventing exacerbations, and maintaining lung function. Improved asthma control is reduces the risk of exacerbations and lung function impairment while reducing the direct costs of asthma care and indirect costs associated with reduced productivity. Understanding the complex dynamics of the pulmonary system and the lung's response to disease is fundamental to the advancement of Asthma treatment. Computational models of the respiratory system seek to provide a theoretical framework to understand the interaction between structure and function. Their application can improve pulmonary medicine by a patient-specific approach to medicinal methodologies optimizing the delivery given the personalized geometry and personalized ventilation patterns. A three-fold objective is addressed within this dissertation. The first part refers to the comprehension of pulmonary pathophysiology and the mechanics of Asthma and subsequently of constrictive pulmonary conditions in general. The second part refers to the design and implementation of tools that facilitate personalized medicine to improve delivery and effectiveness. Finally, the third part refers to the self-management of the condition, meaning that medical personnel and patients have access to tools and methods that allow the first party to easily track the course of the condition and the second party, i.e. the patient to easily self-manage it alleviating the significant burden from the health system.


SSL-WM: A Black-Box Watermarking Approach for Encoders Pre-trained by Self-supervised Learning

arXiv.org Artificial Intelligence

Recent years have witnessed significant success in Self-Supervised Learning (SSL), which facilitates various downstream tasks. However, attackers may steal such SSL models and commercialize them for profit, making it crucial to protect their Intellectual Property (IP). Most existing IP protection solutions are designed for supervised learning models and cannot be used directly since they require that the models' downstream tasks and target labels be known and available during watermark embedding, which is not always possible in the domain of SSL. To address such a problem especially when downstream tasks are diverse and unknown during watermark embedding, we propose a novel black-box watermarking solution, named SSL-WM, for protecting the ownership of SSL models. SSL-WM maps watermarked inputs by the watermarked encoders into an invariant representation space, which causes any downstream classifiers to produce expected behavior, thus allowing the detection of embedded watermarks. We evaluate SSL-WM on numerous tasks, such as Computer Vision (CV) and Natural Language Processing (NLP), using different SSL models, including contrastive-based and generative-based. Experimental results demonstrate that SSL-WM can effectively verify the ownership of stolen SSL models in various downstream tasks. Furthermore, SSL-WM is robust against model fine-tuning and pruning attacks. Lastly, SSL-WM can also evade detection from evaluated watermark detection approaches, demonstrating its promising application in protecting the IP of SSL models.