AITopics | Iannacci, Francis

Collaborating Authors

Iannacci, Francis

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

vONTSS: vMF based semi-supervised neural topic modeling with optimal transport

Xu, Weijie, Jiang, Xiaoyu, Sengamedu, Srinivasan H., Iannacci, Francis, Zhao, Jinjin

arXiv.org Artificial IntelligenceSep-16-2023

Recently, Neural Topic Models (NTM), inspired by variational autoencoders, have attracted a lot of research interest; however, these methods have limited applications in the real world due to the challenge of incorporating human knowledge. This work presents a semi-supervised neural topic modeling method, vONTSS, which uses von Mises-Fisher (vMF) based variational autoencoders and optimal transport. When a few keywords per topic are provided, vONTSS in the semi-supervised setting generates potential topics and optimizes topic-keyword quality and topic classification. Experiments show that vONTSS outperforms existing semi-supervised topic modeling methods in classification accuracy and diversity. vONTSS also supports unsupervised topic modeling. Quantitative and qualitative experiments show that vONTSS in the unsupervised setting outperforms recent NTMs on multiple aspects: vONTSS discovers highly clustered and coherent topics on benchmark datasets. It is also much faster than the state-of-the-art weakly supervised text classification method while achieving similar classification performance. We further prove the equivalence of optimal transport loss and cross-entropy loss at the global minimum.

machine learning, natural language, text classification, (5 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2023.findings-acl.271

2307.01226

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.53)

Add feedback

S2vNTM: Semi-supervised vMF Neural Topic Modeling

Xu, Weijie, Desai, Jay, Sengamedu, Srinivasan, Jiang, Xiaoyu, Iannacci, Francis

arXiv.org Artificial IntelligenceJul-6-2023

Language model based methods are powerful techniques for text classification. However, the models have several shortcomings. In this paper, we propose Semi-Supervised vMF Neural Topic Modeling (S2vNTM) to overcome these difficulties. S2vNTM takes a few seed keywords as input for topics. S2vNTM leverages the pattern of keywords to identify potential topics, as well as optimize the quality of topics' keywords sets. Across a variety of datasets, S2vNTM outperforms existing semi-supervised topic modeling methods in classification accuracy with limited keywords provided. S2vNTM is at least twice as fast as baselines. Language Model (LM) pre-training Vaswani et al. (2017); Devlin et al. (2018) has proven to be useful in learning universal language representations. Recent language models such as Yang et al. (2019); Sun et al. (2019); Chen et al. (2022); Ding et al. (2021) have achieved amazing results in text classification. Most of these methods need enough high-quality labels to train. To make LM based methods work well when limited labels are available, few shot learning methods such as Bianchi et al. (2021); Meng et al. (2020a;b); Mekala and Shang (2020); Yu et al. (2021); Wang et al. (2021b) have been proposed. However, these methods rely on large pre-trained texts and can be biased to apply to a different environment. Topic modeling methods generate topics based on the pattern of words.

machine learning, natural language, text classification, (17 more...)

arXiv.org Artificial Intelligence

2307.04804

Country: Asia > Middle East > Qatar (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.87)
(2 more...)

Add feedback

KDSTM: Neural Semi-supervised Topic Modeling with Knowledge Distillation

Xu, Weijie, Jiang, Xiaoyu, Desai, Jay, Han, Bin, Yan, Fuqin, Iannacci, Francis

arXiv.org Artificial IntelligenceJul-4-2023

In text classification tasks, fine tuning pretrained language models like BERT and GPT-3 yields competitive accuracy; however, both methods require pretraining on large text datasets. In contrast, general topic modeling methods possess the advantage of analyzing documents to extract meaningful patterns of words without the need of pretraining. To leverage topic modeling's unsupervised insights extraction on text classification tasks, we develop the Knowledge Distillation Semi-supervised Topic Modeling (KDSTM). KDSTM requires no pretrained embeddings, few labeled documents and is efficient to train, making it ideal under resource constrained settings. Across a variety of datasets, our method outperforms existing supervised topic modeling methods in classification accuracy, robustness and efficiency and achieves similar performance compare to state of the art weakly supervised text classification methods.

machine learning, natural language, topic modeling, (17 more...)

arXiv.org Artificial Intelligence

2307.01878

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

FFPDG: Fast, Fair and Private Data Generation

Xu, Weijie, Zhao, Jinjin, Iannacci, Francis, Wang, Bo

arXiv.org Artificial IntelligenceJun-30-2023

Generative modeling has been used frequently in synthetic data generation. Fairness and privacy are two big concerns for synthetic data. Although Recent GAN [Goodfellow et al. (2014)] based methods show good results in preserving privacy, the generated data may be more biased. At the same time, these methods require high computation resources. We show the effectiveness of our method theoretically and empirically. We show that models trained on data generated by the proposed method can perform well (in inference stage) on real application scenarios. Synthetic data [Rubin (1993)] is data that is artificially created rather than being generated by actual events.

artificial intelligence, machine learning, synthetic data, (19 more...)

arXiv.org Artificial Intelligence

2307.00161

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback