Goto

Collaborating Authors

 topic coherence measure


Understanding Topic Coherence Measures

#artificialintelligence

Topic Modeling is one of the most important NLP fields. It aims to explain a textual dataset by decomposing it into two distributions: topics and words. So, a Topic Modeling Algorithm is a mathematical/statistical model used to infer what are the topics that better represent the data. For simplicity, a topic can be described as a collection of words, like ['ball', 'cat', 'house'] and ['airplane', 'clouds'], but in practice, what an algorithm does is assign each word in our vocabulary a'participation' value in a given topic. The words with the highest values can be considered as the true participants of a topic.


Extracting Topical Phrases from Clinical Documents

He, Yulan (Aston University)

AAAI Conferences

In clinical documents, medical terms are often expressed in multi-word phrases. Traditional topic modelling approaches relying on the "bag-of-words" assumption are not effective in extracting topic themes from clinical documents. This paper proposes to first extract medical phrases using an off-the-shelf tool for medical concept mention extraction, and then train a topic model which takes a hierarchy of Pitman-Yor processes as prior for modelling the generation of phrases of arbitrary length. Experimental results on patients' discharge summaries show that the proposed approach outperforms the state-of-the-art topical phrase extraction model on both perplexity and topic coherence measure and finds more interpretable topics.