stanford ner
Deep Learning Based Named Entity Recognition Models for Recipes
Goel, Mansi, Agarwal, Ayush, Agrawal, Shubham, Kapuriya, Janak, Konam, Akhil Vamshi, Gupta, Rishabh, Rastogi, Shrey, Niharika, null, Bagler, Ganesh
Recipes are cultural capsules transmitted across generations via unstructured text. Automated protocols for recognizing named entities, the building blocks of recipe text, are of immense value for various applications ranging from information extraction to novel recipe generation. Named entity recognition is a technique for extracting information from unstructured or semi-structured data with known labels. Starting with manually-annotated data of 6,611 ingredient phrases, we created an augmented dataset of 26,445 phrases cumulatively. Simultaneously, we systematically cleaned and analyzed ingredient phrases from RecipeDB, the gold-standard recipe data repository, and annotated them using the Stanford NER. Based on the analysis, we sampled a subset of 88,526 phrases using a clustering-based approach while preserving the diversity to create the machine-annotated dataset. A thorough investigation of NER approaches on these three datasets involving statistical, fine-tuning of deep learning-based language models and few-shot prompting on large language models (LLMs) provides deep insights. We conclude that few-shot prompting on LLMs has abysmal performance, whereas the fine-tuned spaCy-transformer emerges as the best model with macro-F1 scores of 95.9%, 96.04%, and 95.71% for the manually-annotated, augmented, and machine-annotated datasets, respectively.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.04)
- (2 more...)
StanfordNER - training a new model and deploying a web service
Stanford NER is a named-entity recognizer based on linear chain Conditional Random Field (CRF) sequence models. This post details some of the experiments I've done with it, using a corpus to train a Named-Entity Recognizer: the features I've explored (some undocumented), how to setup a web service exposing the trained model and how to call it from a python script. Once Java is setup, you can run Stanford NER using one of the already trained models, which are distributed together with the zip file. Create a file with a sample sentence in english. Then, to apply the english.all.3class.distsim.crf.ser.gz to the sentence above, run the java command shown bellow: This section describes the basic steps to train your own NER model, from pre-processing the corpus (if needed), creating k-folds for cross-fold validation, defining the features to use, and running Stanford NER in evaluation mode.
The Stanford Natural Language Processing Group
The original CRF code is by Jenny Finkel. The feature extractors are by Dan Klein, Christopher Manning, and Jenny Finkel. Much of the documentation and usability is due to Anna Rafferty. More recent code development has been done by various Stanford NLP Group members. Stanford NER is available for download, licensed under the GNU General Public License (v2 or later).
ross-spencer/nerlinks
Named entity recognition combining Tika's content extraction capabilities, with Stanford's NLP server, and Written in #Golang. All communication is done to server side tools via socket connections, rather than embedding libraries and other complex API bits and pieces in the code. This frees us up to focus on development of the capability to combine different results from different services. Using Tika we get a large number of files handled for free so we don't have to worry too much about what files are sent to the server. Send them all! - We handle the exceptions as best as possible.