Nested Hierarchical Dirichlet Processes
Paisley, John, Wang, Chong, Blei, David M., Jordan, Michael I.
We develop a nested hierarchical Dirichlet process (nHDP) for hierarchical topic modeling. The nHDP is a generalization of the nested Chinese restaurant process (nCRP) that allows each word to follow its own path to a topic node according to a document-specific distribution on a shared tree. This alleviates the rigid, single-path formulation of the nCRP, allowing a document to more easily express thematic borrowings as a random effect. We derive a stochastic variational inference algorithm for the model, in addition to a greedy subtree selection method for each document, which allows for efficient inference using massive collections of text documents. We demonstrate our algorithm on 1.8 million documents from The New York Times and 3.3 million documents from Wikipedia.
May-2-2014
- Country:
- North America > United States
- Indiana (0.04)
- Texas (0.04)
- Missouri (0.04)
- Oregon (0.04)
- Kansas (0.04)
- North Carolina (0.04)
- Michigan (0.04)
- Tennessee (0.04)
- Ohio (0.04)
- Virginia (0.04)
- Kentucky (0.04)
- Mississippi (0.04)
- California > Alameda County
- Berkeley (0.14)
- Illinois > Cook County
- Chicago (0.04)
- New Jersey > Mercer County
- Princeton (0.04)
- Florida > Miami-Dade County
- Miami (0.04)
- New York > New York County
- New York City (0.04)
- Europe
- Asia
- Russia (0.14)
- Cambodia (0.14)
- Afghanistan (0.04)
- India (0.04)
- Singapore (0.04)
- Taiwan (0.04)
- Indonesia (0.04)
- Vietnam (0.04)
- North Korea (0.04)
- China
- Middle East
- Jordan (0.05)
- Iraq (0.04)
- Iran (0.04)
- Syria (0.04)
- Saudi Arabia (0.04)
- Israel (0.04)
- Lebanon > Beirut Governorate
- Beirut (0.04)
- Africa > Middle East
- Egypt (0.04)
- North America > United States
- Genre:
- Research Report (0.50)
- Industry:
- Leisure & Entertainment > Sports (1.00)
- Law (1.00)
- Health & Medicine (1.00)
- Banking & Finance (1.00)
- Energy (0.67)
- Consumer Products & Services > Restaurants (0.67)
- Government
- Technology: