AITopics | Text Classification

Collaborating Authors

Text Classification

"A text classifier is an automated means of determining some metadata about a document. Text classifiers are used for such diverse needs as spam filtering, suggesting categories for indexing a document created in a content management system, or automatically sorting help desk requests."
– John Graham-Cumming, Naive Bayesian Text Classification. Dr. Dobb's. May 1 2005.

News Overviews Instructional Materials AI-Alerts Classics

Fairness Evaluation in Text Classification: Machine Learning Practitioner Perspectives of Individual and Group Fairness

Ashktorab, Zahra, Hoover, Benjamin, Agarwal, Mayank, Dugan, Casey, Geyer, Werner, Yang, Hao Bang, Yurochkin, Mikhail

arXiv.org Artificial IntelligenceMar-1-2023

Mitigating algorithmic bias is a critical task in the development and deployment of machine learning models. While several toolkits exist to aid machine learning practitioners in addressing fairness issues, little is known about the strategies practitioners employ to evaluate model fairness and what factors influence their assessment, particularly in the context of text classification. Two common approaches of evaluating the fairness of a model are group fairness and individual fairness. We run a study with Machine Learning practitioners (n=24) to understand the strategies used to evaluate models. Metrics presented to practitioners (group vs. individual fairness) impact which models they consider fair. Participants focused on risks associated with underpredicting/overpredicting and model sensitivity relative to identity token manipulations. We discover fairness assessment strategies involving personal experiences or how users form groups of identity tokens to test model fairness. We provide recommendations for interactive tools for evaluating fairness in text classification.

machine learning, natural language, text classification, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3544548.3581227

2303.00673

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > Germany > Hamburg (0.05)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry: Government > Regional Government (0.92)

Technology:

Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.82)

Add feedback

Supervised Text Classification for Marketing Analytics

#artificialintelligenceFeb-25-2023, 08:00:43 GMT

Marketing data are complex and have dimensions that make analysis difficult. Large unstructured datasets are often too big to extract qualitative insights. Marketing datasets also often involve relational and connected and involve networks. This specialization tackles advanced advertising and marketing analytics through three advanced methods aimed at solving these problems: text classification, text topic modeling, and semantic network analysis. Each key area involves a deep dive into the leading computer science methods aimed at solving these methods using Python.

cu boulder, marketing analytic, supervised text classification

#artificialintelligence

Industry:

Marketing (1.00)
Information Technology > Services (0.65)
Education > Educational Technology > Educational Software > Computer Based Training (0.51)
Education > Educational Setting > Online (0.51)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.65)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.51)

Add feedback

CARE: Collaborative AI-Assisted Reading Environment

Zyska, Dennis, Dycke, Nils, Buchmann, Jan, Kuznetsov, Ilia, Gurevych, Iryna

arXiv.org Artificial IntelligenceFeb-24-2023

Recent years have seen impressive progress in AI-assisted writing, yet the developments in AI-assisted reading are lacking. We propose inline commentary as a natural vehicle for AI-based reading assistance, and present CARE: the first open integrated platform for the study of inline commentary and reading. CARE facilitates data collection for inline commentaries in a commonplace collaborative reading environment, and provides a framework for enhancing reading with NLP-based assistance, such as text classification, generation or question answering. The extensible behavioral logging allows unique insights into the reading and commenting behavior, and flexible configuration makes the platform easy to deploy in new scenarios. To evaluate CARE in action, we apply the platform in a user study dedicated to scholarly peer review. CARE facilitates the data collection and study of inline commentary in NLP, extrinsic evaluation of NLP assistance, and application prototyping. We invite the community to explore and build upon the open source implementation of CARE.

machine learning, natural language, text classification, (19 more...)

arXiv.org Artificial Intelligence

2302.12611

Country:

Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
North America > United States > New Mexico > Santa Fe County > Santa Fe (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
(3 more...)

Genre:

Research Report (1.00)
Questionnaire & Opinion Survey (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.34)

Add feedback

Text Classification Using R, Keras, and Comet ML

#artificialintelligenceFeb-17-2023, 18:56:29 GMT

Text classification is an interesting application of natural language processing. It is a supervised learning methodology that predicts if a piece of text belongs to one category or the other. As a machine learning engineer, you start with a labeled data set that has vast amounts of text that have already been categorized. These algorithms can perform sentiment analysis, create spam filters, and other applications. This tutorial will teach you how to train your binary text classifiers using Keras.

comet ml, dataset, tutorial, (16 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.61)

Add feedback

On the Relation between Sensitivity and Accuracy in In-context Learning

Chen, Yanda, Zhao, Chen, Yu, Zhou, McKeown, Kathleen, He, He

arXiv.org Artificial IntelligenceFeb-17-2023

In-context learning (ICL) suffers from oversensitivity to the prompt, making it unreliable in real-world scenarios. We study the sensitivity of ICL with respect to multiple perturbation types. First, we find that label bias obscures the true sensitivity, and therefore prior work may have significantly underestimated ICL sensitivity. Second, we observe a strong negative correlation between ICL sensitivity and accuracy: predictions sensitive to perturbations are less likely to be correct. Motivated by these findings, we propose \textsc{SenSel}, a few-shot selective prediction method that abstains from sensitive predictions. Experiments on ten classification datasets show that \textsc{SenSel} consistently outperforms two commonly used confidence-based and entropy-based baselines on abstention decisions.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2209.07661

Country: North America > United States > New York (0.04)

Genre: Research Report (0.83)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
(2 more...)

Add feedback

Unsupervised Keyphrase Extraction via Interpretable Neural Networks

Joshi, Rishabh, Balachandran, Vidhisha, Saldanha, Emily, Glenski, Maria, Volkova, Svitlana, Tsvetkov, Yulia

arXiv.org Artificial IntelligenceFeb-17-2023

Keyphrase extraction aims at automatically extracting a list of "important" phrases representing the key concepts in a document. Prior approaches for unsupervised keyphrase extraction resorted to heuristic notions of phrase importance via embedding clustering or graph centrality, requiring extensive domain expertise. Our work presents a simple alternative approach which defines keyphrases as document phrases that are salient for predicting the topic of the document. To this end, we propose INSPECT -- an approach that uses self-explaining models for identifying influential keyphrases in a document by measuring the predictive impact of input phrases on the downstream task of the document topic classification. We show that this novel method not only alleviates the need for ad-hoc heuristics but also achieves state-of-the-art results in unsupervised keyphrase extraction in four datasets across two domains: scientific publications and news articles.

information retrieval, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2203.0764

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Colorado > Denver County > Denver (0.04)
(6 more...)

Genre: Research Report > Promising Solution (0.34)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Energy (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.68)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.64)
(2 more...)

Add feedback

Automated classification of Chandra X-ray point sources using machine learning methods

#artificialintelligenceFeb-15-2023, 21:00:45 GMT

A large number of unidentified sources found by astronomical surveys and other observations necessitate the use of an automated classification technique based on machine learning methods. The aim of this paper is to find a suitable automated classifier to identify the point X-ray sources in the Chandra Source Catalogue (CSC) 2.0 in the categories of active galactic nuclei (AGN), X-ray emitting stars, young stellar objects (YSOs), high-mass X-ray binaries (HMXBs), low-mass X-ray binaries (LMXBs), ultra luminous X-ray sources (ULXs), cataclysmic variables (CVs), and pulsars. The catalogue consists of 3, 17, 000 sources, out of which we select 2,77,069 point sources based on the quality flags available in CSC 2.0. In order to identify unknown sources of CSC 2.0, we use multi-wavelength features, such as magnitudes in optical/UV bands from Gaia-EDR3, SDSS and GALEX, and magnitudes in IR bands from 2MASS, WISE and MIPS-Spitzer, in addition to X-ray features (flux and variability) from CSC 2.0. We find the Light Gradient Boosted Machine, an advanced decision tree-based machine learning classification algorithm, suitable for our purpose and achieve 93% precision, 93% recall score and 0.91 Mathew's Correlation coefficient score.

automated classification, chandra x-ray point source, x-ray binary, (7 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.64)

Add feedback

A Survey of Multi-task Learning in Natural Language Processing: Regarding Task Relatedness and Training Methods

Zhang, Zhihan, Yu, Wenhao, Yu, Mengxia, Guo, Zhichun, Jiang, Meng

arXiv.org Artificial IntelligenceFeb-14-2023

By focusing on one such two "how to share" categories into task, the model ignores knowledge from the training five categories, including feature learning approach, signals of related tasks (Ruder, 2017). There low-rank approach, task clustering approach, task are a great number of tasks in NLP, from syntax relation learning approach, and decomposition approach; parsing to information extraction, from machine Crawshaw (2020) presented more recent translation to question answering: each requires models in both single-domain and multi-modal architectures, a model dedicated to learning from data. Biologically, as well as an overview of optimization humans learn natural languages, from basic methods in MTL. Nevertheless, it is still not clearly grammar to complex semantics in a single brain understood how to design and train a single model (Hashimoto et al., 2017). In the field of machine to handle a variety of NLP tasks according to task learning, multi-task learning (MTL) aims to leverage relatedness. Especially when faced with a set of useful information shared across multiple related tasks that are seldom simultaneously trained previously, tasks to improve the generalization performance it is of crucial importance that researchers on all tasks (Caruana, 1997). In deep neural find proper auxiliary tasks and assess the feasibility networks, it is generally achieved by sharing part of of such multi-task learning attempt.

learning, representation, survey article, (15 more...)

arXiv.org Artificial Intelligence

2204.03508

Country:

North America > United States > Indiana > St. Joseph County > Notre Dame (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
Asia > China > Heilongjiang Province > Harbin (0.04)

Genre: Overview (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.93)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.68)
(3 more...)

Add feedback

Identifying Semantically Difficult Samples to Improve Text Classification

Mujumdar, Shashank, Mehta, Stuti, Patel, Hima, Mitra, Suman

arXiv.org Artificial IntelligenceFeb-13-2023

In this paper, we investigate the effect of addressing difficult samples from a given text dataset on the downstream text classification task. We define difficult samples as being non-obvious cases for text classification by analysing them in the semantic embedding space; specifically - (i) semantically similar samples that belong to different classes and (ii) semantically dissimilar samples that belong to the same class. We propose a penalty function to measure the overall difficulty score of every sample in the dataset. We conduct exhaustive experiments on 13 standard datasets to show a consistent improvement of up to 9% and discuss qualitative results to show effectiveness of our approach in identifying difficult samples for a text classification model.

difficult sample, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2302.06155

Country:

Asia > India (0.05)
South America > Brazil (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)

Genre: Research Report (0.82)

Industry: Information Technology (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Federated Continual Learning for Text Classification via Selective Inter-client Transfer

Chaudhary, Yatin, Rai, Pranav, Schubert, Matthias, Schütze, Hinrich, Gupta, Pankaj

arXiv.org Artificial IntelligenceFeb-12-2023

In this work, we combine the two paradigms: Federated Learning (FL) and Continual Learning (CL) for text classification task in cloud-edge continuum. The objective of Federated Continual Learning (FCL) is to improve deep learning models over life time at each client by (relevant and efficient) knowledge transfer without sharing data. Here, we address challenges in minimizing inter-client interference while knowledge sharing due to heterogeneous tasks across clients in FCL setup. In doing so, we propose a novel framework, Federated Selective Inter-client Transfer (FedSeIT) which selectively combines model parameters of foreign clients. To further maximize knowledge transfer, we assess domain overlap and select informative tasks from the sequence of historical tasks at each foreign client while preserving privacy. Evaluating against the baselines, we show improved performance, a gain of (average) 12.4\% in text classification over a sequence of tasks using five datasets from diverse domains. To the best of our knowledge, this is the first work that applies FCL to NLP.

machine learning, natural language, text classification, (18 more...)

arXiv.org Artificial Intelligence

2210.06101

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.05)
North America > United States > Virginia (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report (0.82)

Industry:

Law (0.46)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.83)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback