AITopics | topic number

Collaborating Authors

topic number

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Novel Method of Fuzzy Topic Modeling based on Transformer Processing

Tseng, Ching-Hsun, Lee, Shin-Jye, Cheng, Po-Wei, Lee, Chien, Hung, Chih-Chieh

arXiv.org Artificial IntelligenceSep-18-2023

Topic modeling is admittedly a convenient way to monitor markets trend. Conventionally, Latent Dirichlet Allocation, LDA, is considered a must-do model to gain this type of information. By given the merit of deducing keyword with token conditional probability in LDA, we can know the most possible or essential topic. However, the results are not intuitive because the given topics cannot wholly fit human knowledge. LDA offers the first possible relevant keywords, which also brings out another problem of whether the connection is reliable based on the statistic possibility. It is also hard to decide the topic number manually in advance. As the booming trend of using fuzzy membership to cluster and using transformers to embed words, this work presents the fuzzy topic modeling based on soft clustering and document embedding from state-of-the-art transformer-based model. In our practical application in a press release monitoring, the fuzzy topic modeling gives a more natural result than the traditional output from LDA.

dimension, topic number, transformer, (13 more...)

arXiv.org Artificial Intelligence

2309.09658

Country:

Asia > Taiwan > Taiwan Province > Taipei (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Promising Solution (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)

Add feedback

How to perform topic modeling with Top2Vec

#artificialintelligenceNov-17-2021, 16:26:55 GMT

Topic modeling is a problem in natural language processing that has many real-world applications. Being able to discover topics within large sections of text helps us understand text data in greater detail. For many years, Latent Dirichlet Allocation (LDA) has been the most commonly used algorithm for topic modeling. The algorithm was first introduced in 2003 and treats topics as probability distributions for the occurrence of different words. If you want to see an example of LDA in action, you should check out my article below where I performed LDA on a fake news classification dataset.

algorithm, top2vec, vector, (16 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.49)

Add feedback

How Pandemic Spread in News: Text Analysis Using Topic Model

Wang, Minghao, Mengoni, Paolo

arXiv.org Artificial IntelligenceFeb-19-2021

Researches about COVID-19 has increased largely, no matter in the biology field or the others. This research conducted a text analysis using LDA topic model. We firstly scraped totally 1127 articles and 5563 comments on SCMP covering COVID-19 from Jan 20 to May 19, then we trained the LDA model and tuned parameters based on the Cv coherence as the model evaluation method. With the optimal model, dominant topics, representative documents of each topic and the inconsistence between articles and comments are analyzed. 3 possible improvements are discussed at last.

article and comment, pandemic spread, text analysis, (15 more...)

arXiv.org Artificial Intelligence

2102.04205

Country:

Europe (0.14)
Asia > China > Hubei Province > Wuhan (0.05)
Asia > China > Hong Kong (0.04)
(7 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.90)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.73)

Add feedback

Unification of HDP and LDA Models for Optimal Topic Clustering of Subject Specific Question Banks

Fernandes, Nikhil, Gkolia, Alexandra, Pizzo, Nicolas, Davenport, James, Nair, Akshar

arXiv.org Machine LearningOct-4-2020

There has been an increasingly popular trend in Universities for curriculum transformation to make teaching more interactive and suitable for online courses. An increase in the popularity of online courses would result in an increase in the number of course-related queries for academics. This, coupled with the fact that if lectures were delivered in a video on demand format, there would be no fixed time where the majority of students could ask questions. When questions are asked in a lecture there is a negligible chance of having similar questions repeatedly, but asynchronously this is more likely. In order to reduce the time spent on answering each individual question, clustering them is an ideal choice. There are different unsupervised models fit for text clustering, of which the Latent Dirichlet Allocation model is the most commonly used. We use the Hierarchical Dirichlet Process to determine an optimal topic number input for our LDA model runs. Due to the probabilistic nature of these topic models, the outputs of them vary for different runs. The general trend we found is that not all the topics were being used for clustering on the first run of the LDA model, which results in a less effective clustering. To tackle probabilistic output, we recursively use the LDA model on the effective topics being used until we obtain an efficiency ratio of 1. Through our experimental results we also establish a reasoning on how Zeno's paradox is avoided.

lda model, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

2011.01035

Country:

Asia > Middle East > Jordan (0.05)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.82)

Industry:

Education > Educational Setting > Online (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (0.44)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Representations of Context in Recognizing the Figurative and Literal Usages of Idioms

Liu, Changsheng (University of Pittsburgh) | Hwa, Rebecca (University of Pittsburgh)

AAAI ConferencesFeb-14-2017

Many idiomatic expressions can be interpreted literally or figuratively, depending on the context in which they occur. Developing an appropriate computational model of the context is crucial for automatic idiom usage recognition. While many existing methods incorporate some elements of context, they have not sufficiently captured the interactions between the linguistic properties of idiomatic expressions and the representations of the context. In this paper we perform an in-depth exploration of the role of representations of the context for idiom usage recognition; we highlight the advantages and limitations of different representation choices in existing methods in terms of known linguistic properties of idioms; we then propose a supervised ensemble method that selects representations adaptively for different idioms. Experimental result suggests that the proposed method performs better for a wider range of idioms than previous methods.

artificial intelligence, machine learning, natural language, (17 more...)

AAAI Conferences

Thirty-First AAAI Conference on Artificial Intelligence

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.46)

Add feedback

Infinite Author Topic Model based on Mixed Gamma-Negative Binomial Process

Xuan, Junyu, Lu, Jie, Zhang, Guangquan, Da Xu, Richard Yi, Luo, Xiangfeng

arXiv.org Machine LearningMar-30-2015

Incorporating the side information of text corpus, i.e., authors, time stamps, and emotional tags, into the traditional text mining models has gained significant interests in the area of information retrieval, statistical natural language processing, and machine learning. One branch of these works is the so-called Author Topic Model (ATM), which incorporates the authors's interests as side information into the classical topic model. However, the existing ATM needs to predefine the number of topics, which is difficult and inappropriate in many real-world settings. In this paper, we propose an Infinite Author Topic (IAT) model to resolve this issue. Instead of assigning a discrete probability on fixed number of topics, we use a stochastic process to determine the number of topics from the data itself. To be specific, we extend a gamma-negative binomial process to three levels in order to capture the author-document-keyword hierarchical structure. Furthermore, each document is assigned a mixed gamma process that accounts for the multi-author's contribution towards this document. An efficient Gibbs sampling inference algorithm with each conditional distribution being closed-form is developed for the IAT model. Experiments on several real-world datasets show the capabilities of our IAT model to learn the hidden topics, authors' interests on these topics and the number of topics simultaneously.

machine learning, natural language, topic model, (15 more...)

arXiv.org Machine Learning

1503.08535

Country: North America > United States (1.00)

Genre: Research Report (0.82)

Add feedback

Nonparametric Relational Topic Models through Dependent Gamma Processes

Xuan, Junyu, Lu, Jie, Zhang, Guangquan, Da Xu, Richard Yi, Luo, Xiangfeng

arXiv.org Machine LearningMar-30-2015

Traditional Relational Topic Models provide a way to discover the hidden topics from a document network. Many theoretical and practical tasks, such as dimensional reduction, document clustering, link prediction, benefit from this revealed knowledge. However, existing relational topic models are based on an assumption that the number of hidden topics is known in advance, and this is impractical in many real-world applications. Therefore, in order to relax this assumption, we propose a nonparametric relational topic model in this paper. Instead of using fixed-dimensional probability distributions in its generative model, we use stochastic processes. Specifically, a gamma process is assigned to each document, which represents the topic interest of this document. Although this method provides an elegant solution, it brings additional challenges when mathematically modeling the inherent network structure of typical document network, i.e., two spatially closer documents tend to have more similar topics. Furthermore, we require that the topics are shared by all the documents. In order to resolve these challenges, we use a subsampling strategy to assign each document a different gamma process from the global gamma process, and the subsampling probabilities of documents are assigned with a Markov Random Field constraint that inherits the document network structure. Through the designed posterior inference algorithm, we can discover the hidden topics and its number simultaneously. Experimental results on both synthetic and real-world network datasets demonstrate the capabilities of learning the hidden topics and, more importantly, the number of topics.

data mining, machine learning, natural language, (18 more...)

arXiv.org Machine Learning

1503.08542

Country:

Asia > China (0.68)
North America > United States > New York (0.14)

Genre: Research Report (0.82)

Industry: Education (0.93)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
(2 more...)

Add feedback

SemRec: A Semantic Enhancement Framework for Tag Based Recommendation

Xu, Guandong (Victoria University) | Gu, Yanhui (University of Tokyo) | Dolog, Peter (Aalborg University) | Zhang, Yanchun (Victoria University) | Kitsuregawa, Masaru (University of Tokyo)

AAAI ConferencesAug-4-2011

Collaborative tagging services provided by various social web sites become popular means to mark web resources for different purposes such as categorization, expression of a preference and so on. However, the tags are of syntactic nature, in a free style and do not reflect semantics, resulting in the problems of redundancy, ambiguity and less semantics. Current tag-based recommender systems mainly take the explicit structural information among users, resources and tags into consideration, while neglecting the important implicit semantic relationships hidden in tagging data. In this study, we propose a Semantic Enhancement Recommendation strategy (SemRec), based on both structural information and semantic information through a unified fusion model. Extensive experiments conducted on two real datasets demonstarte the effectiveness of our approaches.

artificial intelligence, machine learning, natural language, (20 more...)

AAAI Conferences

Twenty-Fifth AAAI Conference on Artificial Intelligence

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States > New York > New York County > New York City (0.05)
Asia > Middle East > Jordan (0.04)
(3 more...)

Industry: Information Technology (0.68)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)

Add feedback