astrophysic literature
Improving astroBERT using Semantic Textual Similarity
Grezes, Felix, Allen, Thomas, Blanco-Cuaresma, Sergi, Accomazzi, Alberto, Kurtz, Michael J., Shapurian, Golnaz, Henneken, Edwin, Grant, Carolyn S., Thompson, Donna M., Hostetler, Timothy W., Templeton, Matthew R., Lockhart, Kelly E., Chen, Shinyi, Koch, Jennifer, Jacovich, Taylor, Protopapas, Pavlos
The NASA Astrophysics Data System (ADS) is an essential tool for researchers that allows them to explore the astronomy and astrophysics scientific literature, but it has yet to exploit recent advances in natural language processing. At ADASS 2021, we introduced astroBERT, a machine learning language model tailored to the text used in astronomy papers in ADS. In this work we: 1. announce the first public release of the astroBERT language model; 2. show how astroBERT improves over existing public language models on astrophysics specific tasks; 3. and detail how ADS plans to harness the unique structure of scientific papers, the citation graph and citation context, to further improve astroBERT.
Detecting Entities in the Astrophysics Literature: A Comparison of Word-based and Span-based Entity Recognition Methods
NER refers to the task of identifying A large body of scientific literature is published mentions of different types of entities in in different domains, making it difficult for researchers free-text. Types of entities of interest depend on in their respective fields to find information the domain of the text; for example disease names or keep up-to-date. Automatic information in biomedical text (Islamaj Doğan et al., 2014; extraction, in particular Named Entity Recognition Dai, 2021) or numbers in finance (Loukas et al., (NER), is one of the core methods from the 2022). Methods to recognise such entities should NLP community to assist researchers. It finds also handle different types of the text, including mentions of entities of interest in a given text, both formal and informal text, such as social media such as in medicine (Rybinski et al., 2021), astronomy posts (Karimi et al., 2015; Basaldella et al., 2020).