SubRegWeigh: Effective and Efficient Annotation Weighing with Subword Regularization

Tsuji, Kohei, Hiraoka, Tatsuya, Cheng, Yuchang, Iwakura, Tomoya

Sep-10-2024–arXiv.org Artificial Intelligence

Such a method to weigh annotation errors is recently Various NLP tasks exploit the pair of the raw text studied in the NER field. Wang et al. (2019) and the annotation label for training and evaluating proposed CrossWeigh which is the method for detecting models. For example of named entity recognition annotation errors in the dataset and adjusting (NER), which is applied to various practical technologies their learning priority by weighting loss values such as location detection (Inkpen et al., so that the training is not affected by such annotation 2017) and anonymization (Mamede et al., 2016), errors. However, there are shortcomings some parts of the text are annotated as named entities in its computational efficiency, especially in the (e.g., location names or personal names). And recent NLP trends with the pre-trained large language then, a model is trained to extract these entities models. We consider that the more efficient from the raw text. To achieve higher performance methods of annotation weighing can speed up the in NLP tasks, the models should be trained or finetuned development of NLP. In addition, reducing the computational with a sophisticated training dataset without cost contributes to Green AI (Schwartz annotation errors.

annotation error, computational linguistic, dataset, (14 more...)

arXiv.org Artificial Intelligence

Sep-10-2024

arXiv.org PDF

Add feedback

Country:
- Asia
  - China > Hong Kong (0.04)
  - Japan (0.04)
  - Singapore (0.04)
- Europe
  - Denmark > Capital Region
    - Copenhagen (0.04)
  - Germany > Berlin (0.04)
  - Iceland > Capital Region
    - Reykjavik (0.04)
  - Ireland (0.04)
- North America
  - Canada > Ontario
    - Toronto (0.04)
  - Dominican Republic (0.04)
  - United States
    - Minnesota > Hennepin County
      - Minneapolis (0.14)
    - New Mexico > Santa Fe County
      - Santa Fe (0.04)
    - Washington > King County
      - Seattle (0.04)
- Oceania > Australia
  - Victoria > Melbourne (0.04)

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Natural Language > Text Processing (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found