Customizing the SentenceDetector in Spark NLP

Aug-23-2021, 22:35:30 GMT–#artificialintelligence

There are many Natural Language Processing (NLP) tasks that require text to be split in chunks of varying granularity: 1. Document 2. Sentence 3. Token 4. etc… This post is focused on splitting text into sentences in order to facilitate later downstream tasks, such as, Named Entity Recognition (NER), Text Classification or Sentiment Analysis. Splitting a sentence correctly can be crucial for the success of the downstream task as we can see in the following example. Suppose we (wrongly) split a German legal reference like: "Schütze ZPO 4. Aufl. Now you might say this is special subject stuff and there are always exotic cases. But this issue also occurs in daily life when you want to extract common things. Consider, for example, (an invented) German address (with correct syntax for zip code and so forth): "Dr.

abbreviation, ipynb notebook, spark nlp, (11 more...)

#artificialintelligence

Aug-23-2021, 22:35:30 GMT

News Web Page

Add feedback

Country:
- Europe > Germany
  - North Rhine-Westphalia > Upper Bavaria
    - Munich (0.05)
  - Bavaria > Upper Bavaria
    - Munich (0.06)

Industry:
- Law (0.56)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.69)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found