AITopics | Text Classification

Collaborating Authors

Text Classification

"A text classifier is an automated means of determining some metadata about a document. Text classifiers are used for such diverse needs as spam filtering, suggesting categories for indexing a document created in a content management system, or automatically sorting help desk requests."
– John Graham-Cumming, Naive Bayesian Text Classification. Dr. Dobb's. May 1 2005.

News Overviews Instructional Materials AI-Alerts Classics

Computer-Assisted Creation of Boolean Search Rules for Text Classification in the Legal Domain

Westermann, Hannes, Savelka, Jaromir, Walker, Vern R., Ashley, Kevin D., Benyekhlef, Karim

arXiv.org Artificial IntelligenceDec-10-2021

In this paper, we present a method of building strong, explainable classifiers in the form of Boolean search rules. We developed an interactive environment called CASE (Computer Assisted Semantic Exploration) which exploits word co-occurrence to guide human annotators in selection of relevant search terms. The system seamlessly facilitates iterative evaluation and improvement of the classification rules. The process enables the human annotators to leverage the benefits of statistical information while incorporating their expert intuition into the creation of such rules. We evaluate classifiers created with our CASE system on 4 datasets, and compare the results to machine learning methods, including SKOPE rules, Random forest, Support Vector Machine, and fastText classifiers. The results drive the discussion on trade-offs between superior compactness, simplicity, and intuitiveness of the Boolean search rules versus the better performance of state-of-the-art machine learning models for text classification.

classification, classifier, dataset, (12 more...)

arXiv.org Artificial Intelligence

doi: 10.3233/FAIA190313

2112.05807

Country:

North America > United States (0.14)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report (1.00)

Industry:

Law (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.69)

Add feedback

Land use identification through social network interaction

Pauca-Quispe, Diana C., Butron-Revilla, Cinthya, Suarez-Lopez, Ernesto, Aranibar-Tila, Karla, Aguilar-Ruiz, Jesus S.

arXiv.org Artificial IntelligenceDec-5-2021

The Internet generates large volumes of data at a high rate, in particular, posts on social networks. Although social network data has numerous semantic adulterations, and is not intended to be a source of geo-spatial information, in the text of posts we find pieces of important information about how people relate to their environment, which can be used to identify interesting aspects of how human beings interact with portions of land based on their activities. This research proposes a methodology for the identification of land uses using Natural Language Processing (NLP) from the contents of the popular social network Twitter. It will be approached by identifying keywords with linguistic patterns from the text, and the geographical coordinates associated with the publication. Context-specific innovations are introduced to deal with data across South America and, in particular, in the city of Arequipa, Peru. The objective is to identify the five main land uses: residential, commercial, institutional-governmental, industrial-offices and unbuilt land. Within the framework of urban planning and sustainable urban management, the methodology contributes to the optimization of the identification techniques applied for the updating of land use cadastres, since the results achieved an accuracy of about 90%, which motivates its application in the real context. In addition, it would allow the identification of land use categories at a more detailed level, in situations such as a complex/mixed distribution building based on the amount of data collected. Finally, the methodology makes land use information available in a more up-to-date fashion and, above all, avoids the high economic cost of the non-automatic production of land use maps for cities, mostly in developing countries.

land use, social network interaction, tweet, (12 more...)

arXiv.org Artificial Intelligence

doi: 10.3390/app12178580

2112.06704

Country:

South America > Peru > Arequipa Department > Arequipa Province > Arequipa (0.27)
South America > Peru > Callao Department > Callao (0.05)
Europe > Spain > Andalusia > Seville Province > Seville (0.04)

Genre: Research Report (1.00)

Industry:

Law > Real Estate Law (1.00)
Information Technology > Services (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
(2 more...)

Add feedback

Adapting BERT for Continual Learning of a Sequence of Aspect Sentiment Classification Tasks

Ke, Zixuan, Xu, Hu, Liu, Bing

arXiv.org Artificial IntelligenceDec-5-2021

This paper studies continual learning (CL) of a sequence of aspect sentiment classification (ASC) tasks. Although some CL techniques have been proposed for document sentiment classification, we are not aware of any CL work on ASC. A CL system that incrementally learns a sequence of ASC tasks should address the following two issues: (1) transfer knowledge learned from previous tasks to the new task to help it learn a better model, and (2) maintain the performance of the models for previous tasks so that they are not forgotten. This paper proposes a novel capsule network based model called B-CL to address these issues. B-CL markedly improves the ASC performance on both the new task and the old tasks via forward and backward knowledge transfer. The effectiveness of B-CL is demonstrated through extensive experiments.

capsule, knowledge, learning, (14 more...)

arXiv.org Artificial Intelligence

2112.03271

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.92)

Add feedback

CLASSIC: Continual and Contrastive Learning of Aspect Sentiment Classification Tasks

Ke, Zixuan, Liu, Bing, Xu, Hu, Shu, Lei

arXiv.org Artificial IntelligenceDec-5-2021

This paper studies continual learning (CL) of a sequence of aspect sentiment classification(ASC) tasks in a particular CL setting called domain incremental learning (DIL). Each task is from a different domain or product. The DIL setting is particularly suited to ASC because in testing the system needs not know the task/domain to which the test data belongs. To our knowledge, this setting has not been studied before for ASC. This paper proposes a novel model called CLASSIC. The key novelty is a contrastive continual learning method that enables both knowledge transfer across tasks and knowledge distillation from old tasks to the new task, which eliminates the need for task ids in testing. Experimental results show the high effectiveness of CLASSIC.

classification, knowledge, learning, (15 more...)

arXiv.org Artificial Intelligence

2112.02714

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Education (0.46)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.86)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.86)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.71)

Add feedback

Topic Driven Adaptive Network for Cross-Domain Sentiment Classification

Zhu, Yicheng, Qiu, Yiqiao, Rao, Yanghui

arXiv.org Artificial IntelligenceNov-28-2021

Cross-domain sentiment classification has been a hot spot these years, which aims to learn a reliable classifier using labeled data from the source domain and evaluate it on the target domain. In this vein, most approaches utilized domain adaptation that maps data from different domains into a common feature space. To further improve the model performance, several methods targeted to mine domain-specific information were proposed. However, most of them only utilized a limited part of domain-specific information. In this study, we first develop a method of extracting domain-specific words based on the topic information. Then, we propose a Topic Driven Adaptive Network (TDAN) for cross-domain sentiment classification. The network consists of two sub-networks: semantics attention network and domain-specific word attention network, the structures of which are based on transformers. These sub-networks take different forms of input and their outputs are fused as the feature vector. Experiments validate the effectiveness of our TDAN on sentiment classification across domains.

classification, domain-specific word, sentiment classification, (15 more...)

arXiv.org Artificial Intelligence

2111.14094

Country: Asia > Middle East > Jordan (0.05)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)

Add feedback

Natural Language Processing in-and-for Design Research

Siddharth, L, Blessing, Lucienne T. M., Luo, Jianxi

arXiv.org Artificial IntelligenceNov-27-2021

We review the scholarly contributions that utilise Natural Language Processing (NLP) methods to support the design process. Using a heuristic approach, we collected 223 articles published in 32 journals and within the period 1991-present. We present state-of-the-art NLP in-and-for design research by reviewing these articles according to the type of natural language text sources: internal reports, design concepts, discourse transcripts, technical publications, consumer opinions, and others. Upon summarizing and identifying the gaps in these contributions, we utilise an existing design innovation framework to identify the applications that are currently being supported by NLP. We then propose a few methodological and theoretical directions for future NLP in-and-for design research.

application, design process, ontology, (14 more...)

arXiv.org Artificial Intelligence

2111.13827

Country:

North America > United States > New York > New York County > New York City (0.05)
Europe > Netherlands > North Holland > Amsterdam (0.05)
Europe > Germany > Baden-Württemberg > Karlsruhe Region > Heidelberg (0.04)
(16 more...)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.45)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (1.00)
Law (1.00)
(9 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
(11 more...)

Add feedback

Text Classification with Prevision.io

#artificialintelligenceNov-25-2021, 11:10:07 GMT

In this post we will show how in just a few minutes the Prevision.io It is known that textual data is usually more tricker and harder to process than the linear or categorical features. In fact, the linear features sometimes need to be scaled. Categorical features are scalar straightly encoded, but transforming texts into machine readable format requires a lot of pre-processing and feature engineering. Moreover, there are many other challenges that have to be addressed: how to cover different languages? How is it possible to preserve the semantic relationship between the words' vocabulary?

dataset, experiment, prevision, (8 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.54)

Add feedback

Hierarchy Decoder is All You Need To Text Classification

Im, SangHun, Kim, Gibaeg, Oh, Heung-Seon, Jo, Seongung, Kim, Donghwan

arXiv.org Artificial IntelligenceNov-22-2021

Hierarchical text classification (HTC) to a taxonomy is essential for various real applications butchallenging since HTC models often need to process a large volume of data that are severelyimbalanced and have hierarchy dependencies. Existing local and global approaches use deep learningto improve HTC by reducing the time complexity and incorporating the hierarchy dependencies.However, it is difficult to satisfy both conditions in a single HTC model. This paper proposes ahierarchy decoder (HiDEC) that uses recursive hierarchy decoding based on an encoder-decoderarchitecture. The key idea of the HiDEC involves decoding a context matrix into a sub-hierarchysequence using recursive hierarchy decoding, while staying aware of hierarchical dependenciesand level information. The HiDEC is a unified model that incorporates the benefits of existingapproaches, thereby alleviating the aforementioned difficulties without any trade-off. In addition, itcan be applied to both single- and multi-label classification with a minor modification. The superiorityof the proposed model was verified on two benchmark datasets (WOS-46985 and RCV1) with anexplanation of the reasons for its success

classification, computational linguistic, hierarchy, (12 more...)

arXiv.org Artificial Intelligence

2111.11104

Country:

Asia > Middle East > Qatar > Ad-Dawhah > Doha (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(8 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Text Classification Using TensorFlow

#artificialintelligenceNov-20-2021, 15:41:11 GMT

Text classification is a machine learning technique that assigns a set of predefined categories to open-ended text. Text classifiers can be used to organize, structure, and categorize pretty much any kind of text -- from documents, medical studies and files, and all over the web. This Article will explain about text classification using TensorFlow library. Code below shows library that will used for this project. The data taken for this project is only "train.txt"

classification, text classification, validation, (7 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.86)

Add feedback

MATCH: Metadata-Aware Text Classification in A Large Hierarchy

Zhang, Yu, Shen, Zhihong, Dong, Yuxiao, Wang, Kuansan, Han, Jiawei

arXiv.org Artificial IntelligenceNov-11-2021

Multi-label text classification refers to the problem of assigning each given document its most relevant labels from the label set. Commonly, the metadata of the given documents and the hierarchy of the labels are available in real-world applications. However, most existing studies focus on only modeling the text information, with a few attempts to utilize either metadata or hierarchy signals, but not both of them. In this paper, we bridge the gap by formalizing the problem of metadata-aware text classification in a large label hierarchy (e.g., with tens of thousands of labels). To address this problem, we present the MATCH solution -- an end-to-end framework that leverages both metadata and hierarchy information. To incorporate metadata, we pre-train the embeddings of text and metadata in the same space and also leverage the fully-connected attentions to capture the interrelations between them. To leverage the label hierarchy, we propose different ways to regularize the parameters and output probability of each child label by its parents. Extensive experiments on two massive text datasets with large-scale label hierarchies demonstrate the effectiveness of MATCH over state-of-the-art deep learning baselines.

artificial intelligence, metadata-aware text classification, natural language, (1 more...)

arXiv.org Artificial Intelligence

2102.07349

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.80)

Add feedback