AITopics | Text Classification

Collaborating Authors

Text Classification

"A text classifier is an automated means of determining some metadata about a document. Text classifiers are used for such diverse needs as spam filtering, suggesting categories for indexing a document created in a content management system, or automatically sorting help desk requests."
– John Graham-Cumming, Naive Bayesian Text Classification. Dr. Dobb's. May 1 2005.

News Overviews Instructional Materials AI-Alerts Classics

Towards Human-Centred Explainability Benchmarks For Text Classification

Schlegel, Viktor, Mendez-Guzman, Erick, Batista-Navarro, Riza

arXiv.org Artificial IntelligenceNov-10-2022

Progress on many Natural Language Processing (NLP) tasks, such as text classification, is driven by objective, reproducible and scalable evaluation via publicly available benchmarks. However, these are not always representative of real-world scenarios where text classifiers are employed, such as sentiment analysis or misinformation detection. In this position paper, we put forward two points that aim to alleviate this problem. First, we propose to extend text classification benchmarks to evaluate the explainability of text classifiers. We review challenges associated with objectively evaluating the capabilities to produce valid explanations which leads us to the second main point: We propose to ground these benchmarks in human-centred applications, for example by using social media, gamification or to learn explainability metrics from human judgements.

explanation, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2211.05452

Country:

North America > United States > Hawaii (0.04)
Europe > United Kingdom > England > Greater Manchester > Manchester (0.04)

Genre: Overview (1.00)

Industry: Health & Medicine > Therapeutic Area (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.93)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Hyperbolic Centroid Calculations for Text Classification

Gerek, Aydın, Ferahlar, Cüneyt, Sert, Bilge Şipal, Yüney, Mehmet Can, Taşdemir, Onur, Kalafat, Zeynep Billur, Kelkit, Mert, Ganiz, Murat Can

arXiv.org Artificial IntelligenceNov-8-2022

A new development in NLP is the construction of hyperbolic word embeddings. As opposed to their Euclidean counterparts, hyperbolic embeddings are represented not by vectors, but by points in hyperbolic space. This makes the most common basic scheme for constructing document representations, namely the averaging of word vectors, meaningless in the hyperbolic setting. We reinterpret the vector mean as the centroid of the points represented by the vectors, and investigate various hyperbolic centroid schemes and their effectiveness at text classification.

machine learning, natural language, text classification, (18 more...)

arXiv.org Artificial Intelligence

2211.04462

Country:

Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.05)
Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.05)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(3 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.46)

Add feedback

Exploiting Global and Local Hierarchies for Hierarchical Text Classification

Jiang, Ting, Wang, Deqing, Sun, Leilei, Chen, Zhongzhi, Zhuang, Fuzhen, Yang, Qinghong

arXiv.org Artificial IntelligenceNov-8-2022

Hierarchical text classification aims to leverage label hierarchy in multi-label text classification. Existing methods encode label hierarchy in a global view, where label hierarchy is treated as the static hierarchical structure containing all labels. Since global hierarchy is static and irrelevant to text samples, it makes these methods hard to exploit hierarchical information. Contrary to global hierarchy, local hierarchy as a structured labels hierarchy corresponding to each text sample. It is dynamic and relevant to text samples, which is ignored in previous methods. To exploit global and local hierarchies,we propose Hierarchy-guided BERT with Global and Local hierarchies (HBGL), which utilizes the large-scale parameters and prior language knowledge of BERT to model both global and local hierarchies.Moreover,HBGL avoids the intentional fusion of semantic and hierarchical modules by directly modeling semantic and hierarchical information with BERT.Compared with the state-of-the-art method HGCLR,our method achieves significant improvement on three benchmark datasets.

artificial intelligence, natural language, text classification, (18 more...)

arXiv.org Artificial Intelligence

2205.02613

Country:

Asia > China > Beijing > Beijing (0.05)
Europe > France (0.04)
Asia > China > Hong Kong (0.04)
(3 more...)

Genre: Research Report > Promising Solution (0.34)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.83)

Add feedback

Classify Finance Tweets Faster Using Sparsity - Neural Magic

#artificialintelligenceNov-7-2022, 20:37:12 GMT

The world of finance and stock trading has changed in recent years. As more and more retail investors enter the market, the more important stories and social sentiment become. Think Tesla - one can argue that a lot of the company's value comes from successful social storytelling by its CEO Elon Musk. Social media has the power to turn a bull into a bear and a bear into a bull. Classifying finance tweets using NLP to understand social sentiment is increasingly more important.

classification, dataset, tweet, (14 more...)

#artificialintelligence

Industry: Banking & Finance > Trading (0.70)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.33)

Add feedback

Hierarchical Multi-Label Classification of Scientific Documents

Sadat, Mobashir, Caragea, Cornelia

arXiv.org Artificial IntelligenceNov-5-2022

Automatic topic classification has been studied extensively to assist managing and indexing scientific documents in a digital collection. With the large number of topics being available in recent years, it has become necessary to arrange them in a hierarchy. Therefore, the automatic classification systems need to be able to classify the documents hierarchically. In addition, each paper is often assigned to more than one relevant topic. For example, a paper can be assigned to several topics in a hierarchy tree. In this paper, we introduce a new dataset for hierarchical multi-label text classification (HMLTC) of scientific papers called SciHTC, which contains 186,160 papers and 1,233 categories from the ACM CCS tree. We establish strong baselines for HMLTC and propose a multi-task learning approach for topic classification with keyword labeling as an auxiliary task. Our best model achieves a Macro-F1 score of 34.57% which shows that this dataset provides significant research opportunities on hierarchical scientific topic classification. We make our dataset and code available on Github.

machine learning, natural language, text classification, (20 more...)

arXiv.org Artificial Intelligence

2211.0281

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > San Diego County > San Diego (0.04)
Europe > Italy > Tuscany > Florence (0.04)
(7 more...)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Transfer learning for TensorFlow text classification models in Amazon SageMaker

#artificialintelligenceNov-4-2022, 15:24:58 GMT

Dr. Vivek Madan is an Applied Scientist with the Amazon SageMaker JumpStart team. He got his PhD from University of Illinois at Urbana-Champaign and was a Post Doctoral Researcher at Georgia Tech. He is an active researcher in machine learning and algorithm design and has published papers in EMNLP, ICLR, COLT, FOCS and SODA conferences. João Moura is an AI/ML Specialist Solutions Architect at Amazon Web Services. He is mostly focused on NLP use-cases and helping customers optimize deep learning model training and deployment. He is also an active proponent of low-code ML solutions and ML-specialized hardware. Dr. Ashish Khetan is a Senior Applied Scientist with Amazon SageMaker built-in algorithms and helps develop machine learning algorithms. He got his PhD from University of Illinois Urbana Champaign. He is an active researcher in machine learning and statistical inference and has published many papers in NeurIPS, ICML, ICLR, JMLR, ACL, and EMNLP conferences.

built-in algorithm, dataset, pre-trained model, (13 more...)

#artificialintelligence

Country: North America > United States > Illinois > Champaign County > Urbana (0.25)

Genre: Press Release (0.31)

Industry:

Retail > Online (0.40)
Information Technology (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.55)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.45)

Add feedback

Polyglot Prompt: Multilingual Multitask PrompTraining

Fu, Jinlan, Ng, See-Kiong, Liu, Pengfei

arXiv.org Artificial IntelligenceNov-4-2022

This paper aims for a potential architectural improvement for multilingual learning and asks: Can different tasks from different languages be modeled in a monolithic framework, i.e. without any task/language-specific module? The benefit of achieving this could open new doors for future multilingual research, including allowing systems trained on low resources to be further assisted by other languages as well as other tasks. We approach this goal by developing a learning framework named Polyglot Prompting to exploit prompting methods for learning a unified semantic space for different languages and tasks with multilingual prompt engineering. We performed a comprehensive evaluation of 6 tasks, namely topic classification, sentiment classification, named entity recognition, question answering, natural language inference, and summarization, covering 24 datasets and 49 languages. The experimental results demonstrated the efficacy of multilingual multitask prompt-based learning and led to inspiring observations. We also present an interpretable multilingual evaluation methodology and show how the proposed framework, multilingual multitask prompt training, works. We release all datasets prompted in the best setting and code.

artificial intelligence, natural language, text classification, (15 more...)

arXiv.org Artificial Intelligence

2204.14264

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > China > Hong Kong (0.04)
(19 more...)

Genre: Research Report > Experimental Study (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)

Add feedback

Automated Classification of Intramedullary Spinal Cord Tumors and Inflammatory Demyelinating Lesions Using Deep Learning

#artificialintelligenceNov-1-2022, 06:56:23 GMT

Accurate differentiation of intramedullary spinal cord tumors and inflammatory demyelinating lesions and their subtypes are warranted because of their overlapping characteristics at MRI but with different treatments and prognosis. The authors aimed to develop a pipeline for spinal cord lesion segmentation and classification using two-dimensional MultiResUNet and DenseNet121 networks based on T2-weighted images. A retrospective cohort of 490 patients (118 patients with astrocytoma, 130 with ependymoma, 101 with multiple sclerosis [MS], and 141 with neuromyelitis optica spectrum disorders [NMOSD]) was used for model development, and a prospective cohort of 157 patients (34 patients with astrocytoma, 45 with ependymoma, 33 with MS, and 45 with NMOSD) was used for model testing. In the test cohort, the model achieved Dice scores of 0.77, 0.80, 0.50, and 0.58 for segmentation of astrocytoma, ependymoma, MS, and NMOSD, respectively, against manual labeling. Accuracies of 96% (area under the receiver operating characteristic curve [AUC], 0.99), 82% (AUC, 0.90), and 79% (AUC, 0.85) were achieved for the classifications of tumor versus demyelinating lesion, astrocytoma versus ependymoma, and MS versus NMOSD, respectively.

deep learning, intramedullary spinal cord tumor, tumor and inflammatory demyelinating lesion, (8 more...)

#artificialintelligence

Industry:

Health & Medicine > Therapeutic Area > Oncology > Brain Cancer (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.40)

Add feedback

Label Sleuth: From Unlabeled Text to a Classifier in a Few Hours

Shnarch, Eyal, Halfon, Alon, Gera, Ariel, Danilevsky, Marina, Katsis, Yannis, Choshen, Leshem, Cooper, Martin Santillan, Epelboim, Dina, Zhang, Zheng, Wang, Dakuo, Yip, Lucy, Ein-Dor, Liat, Dankin, Lena, Shnayderman, Ilya, Aharonov, Ranit, Li, Yunyao, Liberman, Naftali, Slesarev, Philip Levin, Newton, Gwilym, Ofek-Koifman, Shila, Slonim, Noam, Katz, Yoav

arXiv.org Artificial IntelligenceOct-31-2022

Text classification can be useful in many real-world scenarios, saving a lot of time for end users. However, building a custom classifier typically requires coding skills and ML knowledge, which poses a significant barrier for many potential users. To lift this barrier, we introduce Label Sleuth, a free open source system for labeling and creating text classifiers. This system is unique for (a) being a no-code system, making NLP accessible to non-experts, (b) guiding users through the entire labeling process until they obtain a custom classifier, making the process efficient -- from cold start to classifier in a few hours, and (c) being open for configuration and extension by developers. By open sourcing Label Sleuth we hope to build a community of users and developers that will broaden the utilization of NLP models.

label sleuth, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2208.01483

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New York > New York County > New York City (0.04)
Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
(5 more...)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.36)

Add feedback

Conditional Supervised Contrastive Learning for Fair Text Classification

Chi, Jianfeng, Shand, William, Yu, Yaodong, Chang, Kai-Wei, Zhao, Han, Tian, Yuan

arXiv.org Artificial IntelligenceOct-31-2022

Contrastive representation learning has gained much attention due to its superior performance in learning representations from both image and sequential data. However, the learned representations could potentially lead to performance disparities in downstream tasks, such as increased silencing of underrepresented groups in toxicity comment classification. In light of this challenge, in this work, we study learning fair representations that satisfy a notion of fairness known as equalized odds for text classification via contrastive learning. Specifically, we first theoretically analyze the connections between learning representations with a fairness constraint and conditional supervised contrastive objectives, and then propose to use conditional supervised contrastive objectives to learn fair representations for text classification. We conduct experiments on two text datasets to demonstrate the effectiveness of our approaches in balancing the trade-offs between task performance and bias mitigation among existing baselines for text classification. Furthermore, we also show that the proposed methods are stable in different hyperparameter settings.

machine learning, natural language, text classification, (17 more...)

arXiv.org Artificial Intelligence

2205.11485

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Virginia (0.04)
North America > United States > New York > New York County > New York City (0.04)
(9 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)

Add feedback