continual
GenCNER: A Generative Framework for Continual Named Entity Recognition
Yang, Yawen, Ma, Fukun, Meng, Shiao, Liu, Aiwei, Wen, Lijie
Abstract--Traditional named entity recognition (NER) aims to identify text mentions into pre-defined entity types. Continual Named Entity Recognition (CNER) is introduced since entity categories are continuously increasing in various real-world scenarios. In this paper, we propose GenCNER, a simple but effective Generative framework for CNER to mitigate the above drawbacks. Specifically, we skillfully convert the CNER task into sustained entity triplet sequence generation problem and utilize a powerful pre-trained seq2seq model to solve it. Additionally, we design a type-specific confidence-based pseudo labeling strategy along with knowledge distillation (KD) to preserve learned knowledge and alleviate the impact of label noise at the triplet level. Experimental results on two benchmark datasets show that our framework outperforms previous state-of-the-art methods in multiple CNER settings, and achieves the smallest gap compared with non-CL results. I. Introduction Named Entity Recognition (NER) is one fundamental task in NLP fields due to its wide application in entity linking [1], relation extraction [2] and knowledge graph [3].
- Asia > China > Beijing > Beijing (0.05)
- North America > Canada (0.04)
- Asia > Singapore (0.04)
- (10 more...)
Exploring Stability-Plasticity Trade-offs for Continual Named Entity Recognition
Zhang, Duzhen, Li, Chenxing, Dong, Jiahua, Liu, Qi, Yu, Dong
Continual Named Entity Recognition (CNER) is an evolving field that focuses on sequentially updating an existing model to incorporate new entity types. Previous CNER methods primarily utilize Knowledge Distillation (KD) to preserve prior knowledge and overcome catastrophic forgetting, strictly ensuring that the representations of old and new models remain consistent. Consequently, they often impart the model with excessive stability (i.e., retention of old knowledge) but limited plasticity (i.e., acquisition of new knowledge). To address this issue, we propose a Stability-Plasticity Trade-off (SPT) method for CNER that balances these aspects from both representation and weight perspectives. From the representation perspective, we introduce a pooling operation into the original KD, permitting a level of plasticity by consolidating representation dimensions. From the weight perspective, we dynamically merge the weights of old and new models, strengthening old knowledge while maintaining new knowledge. During this fusion, we implement a weight-guided selective mechanism to prioritize significant weights. Moreover, we develop a confidence-based pseudo-labeling approach for the current non-entity type, which predicts entity types using the old model to handle the semantic shift of the non-entity type, a challenge specific to CNER that has largely been ignored by previous methods. Extensive experiments across ten CNER settings on three benchmark datasets demonstrate that our SPT method surpasses previous CNER approaches, highlighting its effectiveness in achieving a suitable stability-plasticity trade-off.
- Europe > Switzerland > Zürich > Zürich (0.14)
- Asia > China > Hong Kong (0.04)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
- (6 more...)
- Workflow (0.93)
- Overview (0.93)
- Research Report > New Finding (0.93)
Continual learning: a feature extraction formalization, an efficient algorithm, and fundamental obstructions
Continual learning is an emerging paradigm in machine learning, wherein a model is exposed in an online fashion to data from multiple different distributions (i.e. Precisely, the goal is to perform well in the new environment, while simultaneously retaining the performance on the previous environments (i.e. In this paper, we propose a framework for continual learning through the framework of feature extraction---namely, one in which features, as well as a classifier, are being trained with each environment. When the features are linear, we design an efficient gradient-based algorithm \mathsf{DPGrad}, that is guaranteed to perform well on the current environment, as well as avoid catastrophic forgetting. In the general case, when the features are non-linear, we show such an algorithm cannot exist, whether efficient or not.
Continual learning with task specialist
Solomon, Indu, Aung, Aye Phyu Phyu, Kumar, Uttam, Jayavelu, Senthilnath
Continual learning (CL) adapt the deep learning scenarios with timely updated datasets. However, existing CL models suffer from the catastrophic forgetting issue, where new knowledge replaces past learning. In this paper, we propose Continual Learning with Task Specialists (CLTS) to address the issues of catastrophic forgetting and limited labelled data in real-world datasets by performing class incremental learning of the incoming stream of data. The model consists of Task Specialists (T S) and Task Predictor (T P ) with pre-trained Stable Diffusion (SD) module. Here, we introduce a new specialist to handle a new task sequence and each T S has three blocks; i) a variational autoencoder (V AE) to learn the task distribution in a low dimensional latent space, ii) a K-Means block to perform data clustering and iii) Bootstrapping Language-Image Pre-training (BLIP ) model to generate a small batch of captions from the input data. These captions are fed as input to the pre-trained stable diffusion model (SD) for the generation of task samples. The proposed model does not store any task samples for replay, instead uses generated samples from SD to train the T P module. A comparison study with four SOTA models conducted on three real-world datasets shows that the proposed model outperforms all the selected baselines
Continual Named Entity Recognition without Catastrophic Forgetting
Zhang, Duzhen, Cong, Wei, Dong, Jiahua, Yu, Yahan, Chen, Xiuyi, Zhang, Yonggang, Fang, Zhen
Continual Named Entity Recognition (CNER) is a burgeoning area, which involves updating an existing model by incorporating new entity types sequentially. Nevertheless, continual learning approaches are often severely afflicted by catastrophic forgetting. This issue is intensified in CNER due to the consolidation of old entity types from previous steps into the non-entity type at each step, leading to what is known as the semantic shift problem of the non-entity type. In this paper, we introduce a pooled feature distillation loss that skillfully navigates the trade-off between retaining knowledge of old entity types and acquiring new ones, thereby more effectively mitigating the problem of catastrophic forgetting. Additionally, we develop a confidence-based pseudo-labeling for the non-entity type, \emph{i.e.,} predicting entity types using the old model to handle the semantic shift of the non-entity type. Following the pseudo-labeling process, we suggest an adaptive re-weighting type-balanced learning strategy to handle the issue of biased type distribution. We carried out comprehensive experiments on ten CNER settings using three different datasets. The results illustrate that our method significantly outperforms prior state-of-the-art approaches, registering an average improvement of $6.3$\% and $8.0$\% in Micro and Macro F1 scores, respectively.
Continual raises $4M for its AI-powered data platform – TechCrunch
Continual, a startup that aims to bring operational AI to the modern data warehouse-centric data stack, today announced that it has raised a $4 million seed round led by Amplify Partners, with Illuminate Ventures, Essence, Wayfinder and Data Community Fund also participating in the round. With this announcement, Continual is also opening up its service as a public beta, after testing it with a number of select customers in recent months. The data warehousing space is vast but also dominated by a small number of players, like Snowflake, Amazon Redshift, BigQuery and Databricks. This makes it easier for startups that want to tap into the data stored in them to build their own innovations on top. For Continual, that means providing businesses with an accessible tool for building predictive models.
- Information Technology > Artificial Intelligence (1.00)
- Information Technology > Data Science > Data Mining > Big Data (0.53)
Is Data-First AI the Next Big Thing?
We are roughly a decade removed from the beginnings of the modern machine learning (ML) platform, inspired largely by the growing ecosystem of open-source Python-based technologies for data scientists. It's a good time for us to reflect back upon the progress that has been made, highlight the major problems enterprises have with existing ML platforms, and discuss what the next generation of platforms will be like. As we'll discuss, we believe the next disruption in the ML platform market will be the growth of data-first AI platforms. It is sometimes easy to forget now (or, tragically, maybe it's all too real for some), but there was once a time when building machine learning models required a substantial amount of work. In days not too far gone, this would involve implementing your own algorithms, writing tons of code in the process, and hoping you make no crucial errors in translating academic work into a functional library.