Semiparametric Latent Topic Modeling on Consumer-Generated Corpora

Jul-12-2021–arXiv.org Artificial Intelligence

The fields of natural language processing and information retrieval saw a productive past two decades due largely to the emergence and worldwide adoption of two modern technologies: large-scale document indexing and storage facilities, of which perhaps the two most prominent brands are JSTOR and Google Books, and social networking sites that allow individual users to create and distribute various types of content, a considerable fraction of which exist in the form of texts (status updates, blog posts, and tweets). All these have led to a relentless growth in information-rich but unstructured collections of text data - referred to as corpora in natural language terminology - in terms of volume, velocity, and frequency such that manual approaches to document indexing and classification are quickly becoming obsolete. Outside the context of online archives, methods that enable automated classification and analysis of voluminous corpora would prove to be valuable technology. It has been applied to legal research [Ravi-kumar and Raghuveer, 2012] and for analyzing patterns behind railroad accidents [Williams and Betak, 2018]. In the commercial space, companies can take advantage of thousands of posts being contributed by users on a daily basis about their products and services on social media and review aggregator websites like Yelp and TripAdvisor.

machine learning, natural language, text classification, (19 more...)

arXiv.org Artificial Intelligence

Jul-12-2021

arXiv.org PDF

Add feedback

Country:
- Asia
  - Middle East > Jordan (0.04)
  - Philippines > Luzon
    - National Capital Region > City of Quezon (0.04)
- Africa > Chad
  - Salamat (0.04)

Genre:
- Research Report (0.40)

Industry:
- Law (0.54)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Statistical Learning (0.93)
  - Natural Language
    - Text Classification (0.74)
    - Text Processing (0.70)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found