AITopics | term weighting scheme

Collaborating Authors

term weighting scheme

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Identification of the Relevance of Comments in Codes Using Bag of Words and Transformer Based Models

S, Sruthi, Basu, Tanmay

arXiv.org Artificial IntelligenceAug-11-2023

The Forum for Information Retrieval (FIRE) started a shared task this year for classification of comments of different code segments. This is binary text classification task where the objective is to identify whether comments given for certain code segments are relevant or not. The BioNLP-IISERB group at the Indian Institute of Science Education and Research Bhopal (IISERB) participated in this task and submitted five runs for five different models. The paper presents the overview of the models and other significant findings on the training corpus. The methods involve different feature engineering schemes and text classification techniques. The performance of the classical bag of words model and transformer-based models were explored to identify significant features from the given training corpus. We have explored different classifiers viz., random forest, support vector machine and logistic regression using the bag of words model. Furthermore, the pre-trained transformer based models like BERT, RoBERT and ALBERT were also used by fine-tuning them on the given training corpus. The performance of different such models over the training corpus were reported and the best five models were implemented on the given test corpus. The empirical results show that the bag of words model outperforms the transformer based models, however, the performance of our runs are not reasonably well in both training and test corpus. This paper also addresses the limitations of the models and scope for further improvement.

corpus, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2308.06144

Country:

Asia > India > Madhya Pradesh > Bhopal (0.25)
South America > Argentina > Patagonia > Río Negro Province > Viedma (0.04)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report > New Finding (0.56)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

TF-IDFC-RF: A Novel Supervised Term Weighting Scheme

Carvalho, Flavio, Guedes, Gustavo Paiva

arXiv.org Machine LearningMar-12-2020

Sentiment Analysis is a branch of Affective Computing usually considered a binary classification task. In this line of reasoning, Sentiment Analysis can be applied in several contexts to classify the attitude expressed in text samples, for example, movie reviews, sarcasm, among others. A common approach to represent text samples is the use of the Vector Space Model to compute numerical feature vectors consisting of the weight of terms. The most popular term weighting scheme is TF-IDF (Term Frequency - Inverse Document Frequency). It is an Unsupervised Weighting Scheme (UWS) since it does not consider the class information in the weighting of terms. Apart from that, there are Supervised Weighting Schemes (SWS), which consider the class information on term weighting calculation. Several SWS have been recently proposed, demonstrating better results than TF-IDF. In this scenario, this work presents a comparative study on different term weighting schemes and proposes a novel supervised term weighting scheme, named as TF-IDFC-RF (Term Frequency - Inverse Document Frequency in Class - Relevance Frequency). The effectiveness of TF-IDFC-RF is validated with SVM (Support Vector Machine) and NB (Naive Bayes) classifiers on four commonly used Sentiment Analysis datasets. TF-IDFC-RF outperforms all other weighting schemes and achieves F1 results of more than 99.9% on all datasets with SVM classifier.

dataset, term weighting scheme, weighting scheme, (12 more...)

arXiv.org Machine Learning

2003.07193

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
South America > Argentina > Patagonia > Río Negro Province > Viedma (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry:

Media > Film (0.35)
Leisure & Entertainment (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.86)

Add feedback

Inverse-Category-Frequency based supervised term weighting scheme for text categorization

Wang, Deqing, Zhang, Hui

arXiv.org Artificial IntelligenceJun-5-2012

Term weighting schemes often dominate the performance of many classifiers, such as kNN, centroid-based classifier and SVMs. The widely used term weighting scheme in text categorization, i.e., tf.idf, is originated from information retrieval (IR) field. The intuition behind idf for text categorization seems less reasonable than IR. In this paper, we introduce inverse category frequency (icf) into term weighting scheme and propose two novel approaches, i.e., tf.icf and icf-based supervised term weighting schemes. The tf.icf adopts icf to substitute idf factor and favors terms occurring in fewer categories, rather than fewer documents. And the icf-based approach combines icf and relevance frequency (rf) to weight terms in a supervised way. Our cross-classifier and cross-corpus experiments have shown that our proposed approaches are superior or comparable to six supervised term weighting schemes and three traditional schemes in terms of macro-F1 and micro-F1.

machine learning, natural language, term weighting scheme, (19 more...)

arXiv.org Artificial Intelligence

1012.2609

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.92)

Add feedback