Goto

Collaborating Authors

 Woźniak, Michał


Continual Learning with Weight Interpolation

arXiv.org Artificial Intelligence

This property is known as mode connectivity. Continual learning poses a fundamental challenge for One feature that may be considered when studying this phenomenon modern machine learning systems, requiring models to is the permutation invariance of neural networks adapt to new tasks while retaining knowledge from previous [12]. Neurons or kernels of network layers can be permuted ones. Addressing this challenge necessitates the development and, if neighboring layers' outputs and inputs are adjusted, of efficient algorithms capable of learning from data one can obtain a solution that has the same properties as streams and accumulating knowledge over time. This paper the original model but lies in a completely different part of proposes a novel approach to continual learning utilizing the loss landscape. Considering this fact, one may conclude the weight consolidation method. Our method, a simple that the abundance of local minima in the loss landscape of yet powerful technique, enhances robustness against neural networks results from permutation invariance. In a catastrophic forgetting by interpolating between old and follow-up work, Ainsworth et al. [1] showed how to find new model weights after each novel task, effectively merging permutations of weights that allow for a linear interpolation two models to facilitate exploration of local minima of weights with low or even near zero barriers.


A Natural Gas Consumption Forecasting System for Continual Learning Scenarios based on Hoeffding Trees with Change Point Detection Mechanism

arXiv.org Artificial Intelligence

Forecasting natural gas consumption, considering seasonality and trends, is crucial in planning its supply and consumption and optimizing the cost of obtaining it, mainly by industrial entities. However, in times of threats to its supply, it is also a critical element that guarantees the supply of this raw material to meet individual consumers' needs, ensuring society's energy security. This article introduces a novel multistep ahead forecasting of natural gas consumption with change point detection integration for model collection selection with continual learning capabilities using data stream processing. The performance of the forecasting models based on the proposed approach is evaluated in a complex real-world use case of natural gas consumption forecasting. We employed Hoeffding tree predictors as forecasting models and the Pruned Exact Linear Time (PELT) algorithm for the change point detection procedure. The change point detection integration enables selecting a different model collection for successive time frames. Thus, three model collection selection procedures (with and without an error feedback loop) are defined and evaluated for forecasting scenarios with various densities of detected change points. These models were compared with change point agnostic baseline approaches. Our experiments show that fewer change points result in a lower forecasting error regardless of the model collection selection procedure employed. Also, simpler model collection selection procedures omitting forecasting error feedback leads to more robust forecasting models suitable for continual learning tasks.


Increasing Depth of Neural Networks for Life-long Learning

arXiv.org Artificial Intelligence

Purpose: We propose a novel method for continual learning based on the increasing depth of neural networks. This work explores whether extending neural network depth may be beneficial in a life-long learning setting. Methods: We propose a novel approach based on adding new layers on top of existing ones to enable the forward transfer of knowledge and adapting previously learned representations. We employ a method of determining the most similar tasks for selecting the best location in our network to add new nodes with trainable parameters. This approach allows for creating a tree-like model, where each node is a set of neural network parameters dedicated to a specific task. The Progressive Neural Network concept inspires the proposed method. Therefore, it benefits from dynamic changes in network structure. However, Progressive Neural Network allocates a lot of memory for the whole network structure during the learning process. The proposed method alleviates this by adding only part of a network for a new task and utilizing a subset of previously trained weights. At the same time, we may retain the benefit of PNN, such as no forgetting guaranteed by design, without needing a memory buffer. Results: Experiments on Split CIFAR and Split Tiny ImageNet show that the proposed algorithm is on par with other continual learning methods. In a more challenging setup with a single computer vision dataset as a separate task, our method outperforms Experience Replay. Conclusion: It is compatible with commonly used computer vision architectures and does not require a custom network structure. As an adaptation to changing data distribution is made by expanding the architecture, there is no need to utilize a rehearsal buffer. For this reason, our method could be used for sensitive applications where data privacy must be considered.


Transfer of knowledge among instruments in automatic music transcription

arXiv.org Artificial Intelligence

Automatic music transcription (AMT) is one of the most challenging tasks in the music information retrieval domain. It is the process of converting an audio recording of music into a symbolic representation containing information about the notes, chords, and rhythm. Current research in this domain focuses on developing new models based on transformer architecture or using methods to perform semi-supervised training, which gives outstanding results, but the computational cost of training such models is enormous. This work shows how to employ easily generated synthesized audio data produced by software synthesizers to train a universal model. It is a good base for further transfer learning to quickly adapt transcription model for other instruments. Achieved results prove that using synthesized data for training may be a good base for pretraining general-purpose models, where the task of transcription is not focused on one instrument.


Scalable handwritten text recognition system for lexicographic sources of under-resourced languages and alphabets

arXiv.org Artificial Intelligence

The paper discusses an approach to decipher large collections of handwritten index cards of historical dictionaries. Our study provides a working solution that reads the cards, and links their lemmas to a searchable list of dictionary entries, for a large historical dictionary entitled the Dictionary of the 17th- and 18th-century Polish, which comprizes 2.8 million index cards. We apply a tailored handwritten text recognition (HTR) solution that involves (1) an optimized detection model; (2) a recognition model to decipher the handwritten content, designed as a spatial transformer network (STN) followed by convolutional neural network (RCNN) with a connectionist temporal classification layer (CTC), trained using a synthetic set of 500,000 generated Polish words of different length; (3) a post-processing step using constrained Word Beam Search (WBC): the predictions were matched against a list of dictionary entries known in advance. Our model achieved the accuracy of 0.881 on the word level, which outperforms the base RCNN model. Within this study we produced a set of 20,000 manually annotated index cards that can be used for future benchmarks and transfer learning HTR applications.


Combining Self-labeling with Selective Sampling

arXiv.org Artificial Intelligence

Since data is the fuel that drives machine learning models, and access to labeled data is generally expensive, semi-supervised methods are constantly popular. They enable the acquisition of large datasets without the need for too many expert labels. This work combines self-labeling techniques with active learning in a selective sampling scenario. We propose a new method that builds an ensemble classifier. Based on an evaluation of the inconsistency of the decisions of the individual base classifiers for a given observation, a decision is made on whether to request a new label or use the self-labeling. In preliminary studies, we show that naive application of self-labeling can harm performance by introducing bias towards selected classes and consequently lead to skewed class distribution. Hence, we also propose mechanisms to reduce this phenomenon. Experimental evaluation shows that the proposed method matches current selective sampling methods or achieves better results.


Combined Cleaning and Resampling Algorithm for Multi-Class Imbalanced Data with Label Noise

arXiv.org Machine Learning

The imbalanced data classification is one of the most crucial tasks facing modern data analysis. Especially when combined with other difficulty factors, such as the presence of noise, overlapping class distributions, and small disjuncts, data imbalance can significantly impact the classification performance. Furthermore, some of the data difficulty factors are known to affect the performance of the existing oversampling strategies, in particular SMOTE and its derivatives. This effect is especially pronounced in the multi-class setting, in which the mutual imbalance relationships between the classes complicate even further. Despite that, most of the contemporary research in the area of data imbalance focuses on the binary classification problems, while their more difficult multi-class counterparts are relatively unexplored. In this paper, we propose a novel oversampling technique, a Multi-Class Combined Cleaning and Resampling (MC-CCR) algorithm. The proposed method utilizes an energy-based approach to modeling the regions suitable for oversampling, less affected by small disjuncts and outliers than SMOTE. It combines it with a simultaneous cleaning operation, the aim of which is to reduce the effect of overlapping class distributions on the performance of the learning algorithms. Finally, by incorporating a dedicated strategy of handling the multi-class problems, MC-CCR is less affected by the loss of information about the inter-class relationships than the traditional multi-class decomposition strategies. Based on the results of experimental research carried out for many multi-class imbalanced benchmark datasets, the high robust of the proposed approach to noise was shown, as well as its high quality compared to the state-of-art methods.


Monotonic classification: an overview on algorithms, performance measures and data sets

arXiv.org Artificial Intelligence

Currently, knowledge discovery in databases is an essential step to identify valid, novel and useful patterns for decision making. There are many real-world scenarios, such as bankruptcy prediction, option pricing or medical diagnosis, where the classification models to be learned need to fulfil restrictions of monotonicity (i.e. the target class label should not decrease when input attributes values increase). For instance, it is rational to assume that a higher debt ratio of a company should never result in a lower level of bankruptcy risk. Consequently, there is a growing interest from the data mining research community concerning monotonic predictive models. This paper aims to present an overview about the literature in the field, analyzing existing techniques and proposing a taxonomy of the algorithms based on the type of model generated. For each method, we review the quality metrics considered in the evaluation and the different data sets and monotonic problems used in the analysis. In this way, this paper serves as an overview of the research about monotonic classification in specialized literature and can be used as a functional guide of the field.