LexNLP: Natural language processing and information extraction for legal and regulatory texts

Bommarito, Michael J II, Katz, Daniel Martin, Detterman, Eric M

Jun-10-2018–arXiv.org Machine Learning

LexNLP is an open source Python package focused on natural language processing and machine learning for legal and regulatory text. The package includes functionality to (i) segment documents, (ii) identify key text such as titles and section headings, (iii) extract over eighteen types of structured information like distances and dates, (iv) extract named entities such as companies and geopolitical entities, (v) transform text into features for model training, and (vi) build unsupervised and supervised models such as word embedding or tagging models. LexNLP includes pre-trained models based on thousands of unit tests drawn from real documents available from the SEC EDGAR database as well as various judicial and regulatory proceedings. Keywords: natural language processing, legal, regulatory, machine learning, segmentation, extraction, open source, Python 1. Introduction Over the last two decades, many high-quality, open source packages for natural language processing and machine learning have been released. Researchers and developers can quickly write applications in languages such as Java, Python, and R that stand on the shoulders of comprehensive, well-tested libraries like Stanford NLP ([1]), OpenNLP ([2]), NLTK ([3]), spaCy ([4]), scikit-learn ([5], [6]), Weka ([7]), and gensim ([8]).

extraction, machine learning, natural language, (15 more...)

arXiv.org Machine Learning

Jun-10-2018

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia (0.04)
- North America
  - Canada (0.04)
  - United States
    - New York (0.04)
    - Delaware (0.04)
  - Mexico > Jalisco
    - Guadalajara (0.04)
- Europe
  - Norway (0.04)
  - Iceland (0.04)
  - Germany (0.04)
  - Middle East > Malta
    - Port Region > Southern Harbour District > Valletta (0.04)

Genre:
- Research Report (0.40)

Industry:
- Law (1.00)
- Government > Regional Government
  - North America Government > United States Government (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Text Processing (0.88)
  - Machine Learning > Neural Networks (0.69)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found