Goto

Collaborating Authors

 Transfer Learning


WLV-RIT at GermEval 2021: Multitask Learning with Transformers to Detect Toxic, Engaging, and Fact-Claiming Comments

arXiv.org Artificial Intelligence

At the same time, social media sites have 2020). It is well-known that training large neural increasingly become more prone to offensive content transformer models often result in long processing (Hada et al., 2021; Zhu and Bhat, 2021; Bucur times. As GermEval-2021 features three related et al., 2021). As such, identifying the toxic language tasks, from a performance standpoint, we pose that in social media is a topic that has gained, training a model jointly on three tasks is likely to be and continues to gain traction. Research surrounding computationally more efficient than training three the problem of offensive content has centered models in isolation. Moreover, as GermEval-2021 around the application of computational models provides a single dataset for the three tasks, MTL that can identify various forms of negative content can also be used to help improving performance such as hate speech (Malmasi and Zampieri, 2018; across tasks. As such, we introduce multitask learning Nozza, 2021), abuse (Corazza et al., 2020), aggression whereby one model can predict all three tasks (Kumar et al., 2018, 2020), and cyber-bullying as an alternative approach.


Toward Co-creative Dungeon Generation via Transfer Learning

arXiv.org Artificial Intelligence

Co-creative Procedural Content Generation via Machine Learning However, running user subject studies for every game would be (PCGML) refers to systems where a PCGML agent and a human costly, and it would be difficult to find a user base with relevant work together to produce output content. One of the limitations of design experience for every game since most games do not have co-creative PCGML is that it requires co-creative training data for a their own Game Name Maker level design tool/game. Therefore, PCGML agent to learn to interact with humans. However, acquiring we need a way to develop high quality co-creative agents without this data is a difficult and time-consuming process.


Improving Polyphonic Sound Event Detection on Multichannel Recordings with the S{\o}rensen-Dice Coefficient Loss and Transfer Learning

arXiv.org Artificial Intelligence

The S{\o}rensen--Dice Coefficient has recently seen rising popularity as a loss function (also known as Dice loss) due to its robustness in tasks where the number of negative samples significantly exceeds that of positive samples, such as semantic segmentation, natural language processing, and sound event detection. Conventional training of polyphonic sound event detection systems with binary cross-entropy loss often results in suboptimal detection performance as the training is often overwhelmed by updates from negative samples. In this paper, we investigated the effect of the Dice loss, intra- and inter-modal transfer learning, data augmentation, and recording formats, on the performance of polyphonic sound event detection systems with multichannel inputs. Our analysis showed that polyphonic sound event detection systems trained with Dice loss consistently outperformed those trained with cross-entropy loss across different training settings and recording formats in terms of F1 score and error rate. We achieved further performance gains via the use of transfer learning and an appropriate combination of different data augmentation techniques.


Deep Transfer Learning Based Intrusion Detection System for Electric Vehicular Networks

arXiv.org Artificial Intelligence

The Controller Area Network (CAN) bus works as an important protocol in the real-time In-Vehicle Network (IVN) systems for its simple, suitable, and robust architecture. The risk of IVN devices has still been insecure and vulnerable due to the complex data-intensive architectures which greatly increase the accessibility to unauthorized networks and the possibility of various types of cyberattacks. Therefore, the detection of cyberattacks in IVN devices has become a growing interest. With the rapid development of IVNs and evolving threat types, the traditional machine learning-based IDS has to update to cope with the security requirements of the current environment. Nowadays, the progression of deep learning, deep transfer learning, and its impactful outcome in several areas has guided as an effective solution for network intrusion detection. This manuscript proposes a deep transfer learning-based IDS model for IVN along with improved performance in comparison to several other existing models. The unique contributions include effective attribute selection which is best suited to identify malicious CAN messages and accurately detect the normal and abnormal activities, designing a deep transfer learning-based LeNet model, and evaluating considering real-world data. To this end, an extensive experimental performance evaluation has been conducted. The architecture along with empirical analyses shows that the proposed IDS greatly improves the detection accuracy over the mainstream machine learning, deep learning, and benchmark deep transfer learning models and has demonstrated better performance for real-time IVN security.


Differentiable Architecture Pruning for Transfer Learning

arXiv.org Machine Learning

Transfer learning methods aim to produce machine learning models that are trained on a given problem but perform well also on different new tasks. The interest in transfer learning comes from situations where large data sets can be used for solving a given training task but the data associated with new tasks are too small to train expressive models from scratch. The general transfer-learning strategy is to use the small available new data for adapting a large model that has been previously optimized on the training data. One option consists of keeping the structure of the pre-trained large model intact and fine-tuning its weights to solve the new task. When very few data points are available and the pre-trained network is large, however, customized regularization strategies are needed to mitigate the risk of over-fitting. Fine-tuning only a few parameters is a possible way out but can strongly limit the performance of the final model. Another option is to prune the pre-trained model to reduce its complexity, increase transferability, and prevent overfitting. Existing strategies, however, focus on optimized models and are unable to disentangle the network architecture from the attached weights. As a consequence, the pruned version of the original model can hardly be interpreted as a transferable new architecture and it is difficult to reuse it on new tasks.


Transfer Learning in Information Criteria-based Feature Selection

arXiv.org Machine Learning

This paper investigates the effectiveness of transfer learning based on Mallows' Cp. We propose a procedure that combines transfer learning with Mallows' Cp (TLCp) and prove that it outperforms the conventional Mallows' Cp criterion in terms of accuracy and stability. Our theoretical results indicate that, for any sample size in the target domain, the proposed TLCp estimator performs better than the Cp estimator by the mean squared error (MSE) metric in the case of orthogonal predictors, provided that i) the dissimilarity between the tasks from source domain and target domain is small, and ii) the procedure parameters (complexity penalties) are tuned according to certain explicit rules. Moreover, we show that our transfer learning framework can be extended to other feature selection criteria, such as the Bayesian information criterion. By analyzing the solution of the orthogonalized Cp, we identify an estimator that asymptotically approximates the solution of the Cp criterion in the case of non-orthogonal predictors. Similar results are obtained for the non-orthogonal TLCp. Finally, simulation studies and applications with real data demonstrate the usefulness of the TLCp scheme.


On-edge Multi-task Transfer Learning: Model and Practice with Data-driven Task Allocation

arXiv.org Artificial Intelligence

On edge devices, data scarcity occurs as a common problem where transfer learning serves as a widely-suggested remedy. Nevertheless, transfer learning imposes a heavy computation burden to resource-constrained edge devices. Existing task allocation works usually assume all submitted tasks are equally important, leading to inefficient resource allocation at a task level when directly applied in Multi-task Transfer Learning (MTL). To address these issues, we first reveal that it is crucial to measure the impact of tasks on overall decision performance improvement and quantify \emph{task importance}. We then show that task allocation with task importance for MTL (TATIM) is a variant of the NP-complete Knapsack problem, where the complicated computation to solve this problem needs to be conducted repeatedly under varying contexts. To solve TATIM with high computational efficiency, we propose a Data-driven Cooperative Task Allocation (DCTA) approach. Finally, we evaluate the performance of DCTA by not only a trace-driven simulation, but also a new comprehensive real-world AIOps case study that bridges model and practice via a new architecture and main components design within the AIOps system. Extensive experiments show that our DCTA reduces 3.24 times of processing time, and saves 48.4\% energy consumption compared with the state-of-the-art when solving TATIM.


Memory Efficient Meta-Learning with Large Images

arXiv.org Machine Learning

Meta learning approaches to few-shot classification are computationally efficient at test time requiring just a few optimization steps or single forward pass to learn a new task, but they remain highly memory-intensive to train. This limitation arises because a task's entire support set, which can contain up to 1000 images, must be processed before an optimization step can be taken. Harnessing the performance gains offered by large images thus requires either parallelizing the meta-learner across multiple GPUs, which may not be available, or trade-offs between task and image size when memory constraints apply. We improve on both options by proposing LITE, a general and memory efficient episodic training scheme that enables meta-training on large tasks composed of large images on a single GPU. We achieve this by observing that the gradients for a task can be decomposed into a sum of gradients over the task's training images. This enables us to perform a forward pass on a task's entire training set but realize significant memory savings by back-propagating only a random subset of these images which we show is an unbiased approximation of the full gradient. We use LITE to train meta-learners and demonstrate new state-of-the-art accuracy on the real-world ORBIT benchmark and 3 of the 4 parts of the challenging VTAB MD benchmark relative to leading meta-learners. LITE also enables meta-learners to be competitive with transfer learning approaches but at a fraction of the test-time computational cost, thus serving as a counterpoint to the recent narrative that transfer learning is all you need for few-shot classification.


Do sound event representations generalize to other audio tasks? A case study in audio transfer learning

arXiv.org Artificial Intelligence

Transfer learning is critical for efficient information transfer across multiple related learning problems. A simple, yet effective transfer learning approach utilizes deep neural networks trained on a large-scale task for feature extraction. Such representations are then used to learn related downstream tasks. In this paper, we investigate transfer learning capacity of audio representations obtained from neural networks trained on a large-scale sound event detection dataset. We build and evaluate these representations across a wide range of other audio tasks, via a simple linear classifier transfer mechanism. We show that such simple linear transfer is already powerful enough to achieve high performance on the downstream tasks. We also provide insights into the attributes of sound event representations that enable such efficient information transfer.


Pre-Trained Models: Past, Present and Future

arXiv.org Artificial Intelligence

Large-scale pre-trained models (PTMs) such as BERT and GPT have recently achieved great success and become a milestone in the field of artificial intelligence (AI). Owing to sophisticated pre-training objectives and huge model parameters, large-scale PTMs can effectively capture knowledge from massive labeled and unlabeled data. By storing knowledge into huge parameters and fine-tuning on specific tasks, the rich knowledge implicitly encoded in huge parameters can benefit a variety of downstream tasks, which has been extensively demonstrated via experimental verification and empirical analysis. It is now the consensus of the AI community to adopt PTMs as backbone for downstream tasks rather than learning models from scratch. In this paper, we take a deep look into the history of pre-training, especially its special relation with transfer learning and self-supervised learning, to reveal the crucial position of PTMs in the AI development spectrum. Further, we comprehensively review the latest breakthroughs of PTMs. These breakthroughs are driven by the surge of computational power and the increasing availability of data, towards four important directions: designing effective architectures, utilizing rich contexts, improving computational efficiency, and conducting interpretation and theoretical analysis. Finally, we discuss a series of open problems and research directions of PTMs, and hope our view can inspire and advance the future study of PTMs.