Semiparametric Latent Topic Modeling on Consumer-Generated Corpora

Dayta, Dominic B., Barrios, Erniel B.

arXiv.org Artificial Intelligence 

The fields of natural language processing and information retrieval saw a productive past two decades due largely to the emergence and worldwide adoption of two modern technologies: large-scale document indexing and storage facilities, of which perhaps the two most prominent brands are JSTOR and Google Books, and social networking sites that allow individual users to create and distribute various types of content, a considerable fraction of which exist in the form of texts (status updates, blog posts, and tweets). All these have led to a relentless growth in information-rich but unstructured collections of text data - referred to as corpora in natural language terminology - in terms of volume, velocity, and frequency such that manual approaches to document indexing and classification are quickly becoming obsolete. Outside the context of online archives, methods that enable automated classification and analysis of voluminous corpora would prove to be valuable technology. It has been applied to legal research [Ravi-kumar and Raghuveer, 2012] and for analyzing patterns behind railroad accidents [Williams and Betak, 2018]. In the commercial space, companies can take advantage of thousands of posts being contributed by users on a daily basis about their products and services on social media and review aggregator websites like Yelp and TripAdvisor.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found