LexNLP: Natural language processing and information extraction for legal and regulatory texts

Bommarito, Michael J II, Katz, Daniel Martin, Detterman, Eric M

arXiv.org Machine Learning 

LexNLP is an open source Python package focused on natural language processing and machine learning for legal and regulatory text. The package includes functionality to (i) segment documents, (ii) identify key text such as titles and section headings, (iii) extract over eighteen types of structured information like distances and dates, (iv) extract named entities such as companies and geopolitical entities, (v) transform text into features for model training, and (vi) build unsupervised and supervised models such as word embedding or tagging models. LexNLP includes pre-trained models based on thousands of unit tests drawn from real documents available from the SEC EDGAR database as well as various judicial and regulatory proceedings. Keywords: natural language processing, legal, regulatory, machine learning, segmentation, extraction, open source, Python 1. Introduction Over the last two decades, many high-quality, open source packages for natural language processing and machine learning have been released. Researchers and developers can quickly write applications in languages such as Java, Python, and R that stand on the shoulders of comprehensive, well-tested libraries like Stanford NLP ([1]), OpenNLP ([2]), NLTK ([3]), spaCy ([4]), scikit-learn ([5], [6]), Weka ([7]), and gensim ([8]).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found