DOLDA - a regularized supervised topic model for high-dimensional multi-class regression

Magnusson, Måns, Jonsson, Leif, Villani, Mattias

Oct-20-2016–arXiv.org Machine Learning

During the last decades more and more textual data have become available, creating a growing need to statistically analyze large amounts of textual data. The hugely popular Latent Dirichlet Allocation (LDA) model introduced by Blei et al. (2003) is a generative probability model where each document is summarized by a set of latent semantic themes, often called topics; formally, a topic is a probability distribution over the vocabulary. An estimated LDA model is therefore a compressed latent representation of the documents. LDA is a mixed membership model where each document is a mixture of topics, where each word (token) in a document belongs to a single topic. The basic LDA model is unsupervised, i.e. the topics are learned solely from the words in the documents without access to document labels. In many situations there are also other information we would like to incorporate in modeling a corpus of documents. A common example is when we have labeled documents, such as ratings of movies together with a movie description, illness category in medical journals or the location of the identified bug together with bug reports. In these situation, one can use a so called supervised topic model to find the semantic structure in the documents that are related to the class of interest. One of the first approaches to supervised topic models was proposed by Mcauliffe and Blei (2008).

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Machine Learning

Oct-20-2016

arXiv.org PDF

Add feedback

Country:
- Asia > Middle East
  - Jordan (0.04)
- Europe > Sweden
  - Stockholm > Stockholm (0.04)
  - Östergötland County > Linköping (0.04)
- North America > United States
  - New York > New York County > New York City (0.04)

Genre:
- Research Report (0.64)

Industry:
- Media > Film (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Learning Graphical Models > Directed Networks
      - Bayesian Learning (0.68)
    - Statistical Learning (1.00)
  - Natural Language > Discourse & Dialogue (1.00)
  - Representation & Reasoning > Uncertainty
    - Bayesian Inference (0.93)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found