Improving the NER model with patent texts
In a nutshell, we recognise meaningful entities in the text and classify them according to the categories. Datasets with the most popular entity categories already exist, like Organisations or Brands. Usually, people manually annotate the dataset, which can be expensive, long in time, and quality still depends on the initial data. In this article, I propose another approach. I use already categorised text, which is highly saturated with relevant terms, to train the NER pipeline for the specific domain.
Jun-2-2022, 16:23:33 GMT
- Technology: