How Elsevier Accelerated COVID-19 research using Dask on Saturn Cloud -- Elsevier Labs

#artificialintelligence 

The version of CORD-19 that we used yielded 3,389,064 paragraphs and 16,952,279 sentences. Each sentence is sent to each model and yields zero or more entities. A notable point is that the process of generating entities from sentences is embarrassingly parallel, and therefore processing multiple sentences in parallel achieves savings in processing time. . To process the dataset, we used Dask, an open source library for parallel computing in Python. Dask provides multiple convenient abstractions that mimic familiar APIs such as Numpy and Pandas Dataframes, which can operate on datasets that do not fit in main memory.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found