AITopics | comment classification

Collaborating Authors

comment classification

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Extended LSTM: Adaptive Feature Gating for Toxic Comment Classification

Mohammad, Noor Islam S.

arXiv.org Artificial IntelligenceOct-21-2025

Toxic comment detection remains a challenging task, where transformer-based models (e.g., BERT) incur high computational costs and degrade on minority toxicity classes, while classical ensembles lack semantic adaptability. We propose xLSTM, a parameter-efficient and theoretically grounded framework that unifies cosine-similarity gating, adaptive feature prioritization, and principled class rebalancing. A learnable reference vector {v} in {R}^d modulates contextual embeddings via cosine similarity, amplifying toxic cues and attenuating benign signals to yield stronger gradients under severe class imbalance. xLSTM integrates multi-source embeddings (GloVe, FastText, BERT CLS) through a projection layer, a character-level BiLSTM for morphological cues, embedding-space SMOTE for minority augmentation, and adaptive focal loss with dynamic class weighting. On the Jigsaw Toxic Comment benchmark, xLSTM attains 96.0% accuracy and 0.88 macro-F1, outperforming BERT by 33% on threat and 28% on identity_hate categories, with 15 times fewer parameters and 50ms inference latency. Cosine gating contributes a +4.8% F1 gain in ablations. The results establish a new efficiency adaptability frontier, demonstrating that lightweight, theoretically informed architectures can surpass large pretrained models on imbalanced, domain-specific NLP tasks.

detection, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2510.17018

Genre: Research Report > Experimental Study (0.93)

Industry: Health & Medicine > Therapeutic Area (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Dopamin: Transformer-based Comment Classifiers through Domain Post-Training and Multi-level Layer Aggregation

Hai, Nam Le, Bui, Nghi D. Q.

arXiv.org Artificial IntelligenceAug-6-2024

Code comments provide important information for understanding the source code. They can help developers understand the overall purpose of a function or class, as well as identify bugs and technical debt. However, an overabundance of comments is meaningless and counterproductive. As a result, it is critical to automatically filter out these comments for specific purposes. In this paper, we present Dopamin, a Transformer-based tool for dealing with this issue. Our model excels not only in presenting knowledge sharing of common categories across multiple languages, but also in achieving robust performance in comment classification by improving comment representation. As a result, it outperforms the STACC baseline by 3% on the NLBSE'24 Tool Competition dataset in terms of average F1-score, while maintaining a comparable inference time for practical use. The source code is publicity available at https://github.com/FSoft-AI4Code/Dopamin.

category, comment classification, dopamin, (12 more...)

arXiv.org Artificial Intelligence

2408.04663

Country:

Europe > Portugal > Lisbon > Lisbon (0.05)
North America > United States > New York (0.04)
North America > Canada > Ontario > Toronto (0.04)
(4 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.61)

Add feedback

A ML-LLM pairing for better code comment classification

Akl, Hanna Abi

arXiv.org Artificial IntelligenceOct-13-2023

The "Information Retrieval in Software Engineering (IRSE)" at FIRE 2023 shared task introduces code comment classification, a challenging task that pairs a code snippet with a comment that should be evaluated as either useful or not useful to the understanding of the relevant code. We answer the code comment classification shared task challenge by providing a two-fold evaluation: from an algorithmic perspective, we compare the performance of classical machine learning systems and complement our evaluations from a data-driven perspective by generating additional data with the help of large language model (LLM) prompting to measure the potential increase in performance. Our best model, which took second place in the shared task, is a Neural Network with a Macro-F1 score of 88.401% on the provided seed data and a 1.5% overall increase in performance on the data generated by the LLM.

comment classification, dataset, seed data, (13 more...)

arXiv.org Artificial Intelligence

2310.10275

Country:

Asia > India (0.05)
North America > United States > New York > New York County > New York City (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)
Europe > France > Provence-Alpes-Côte d'Azur (0.04)

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Identification of the Relevance of Comments in Codes Using Bag of Words and Transformer Based Models

S, Sruthi, Basu, Tanmay

arXiv.org Artificial IntelligenceAug-11-2023

The Forum for Information Retrieval (FIRE) started a shared task this year for classification of comments of different code segments. This is binary text classification task where the objective is to identify whether comments given for certain code segments are relevant or not. The BioNLP-IISERB group at the Indian Institute of Science Education and Research Bhopal (IISERB) participated in this task and submitted five runs for five different models. The paper presents the overview of the models and other significant findings on the training corpus. The methods involve different feature engineering schemes and text classification techniques. The performance of the classical bag of words model and transformer-based models were explored to identify significant features from the given training corpus. We have explored different classifiers viz., random forest, support vector machine and logistic regression using the bag of words model. Furthermore, the pre-trained transformer based models like BERT, RoBERT and ALBERT were also used by fine-tuning them on the given training corpus. The performance of different such models over the training corpus were reported and the best five models were implemented on the given test corpus. The empirical results show that the bag of words model outperforms the transformer based models, however, the performance of our runs are not reasonably well in both training and test corpus. This paper also addresses the limitations of the models and scope for further improvement.

corpus, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2308.06144

Country:

Asia > India > Madhya Pradesh > Bhopal (0.25)
South America > Argentina > Patagonia > Río Negro Province > Viedma (0.04)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report > New Finding (0.56)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Classification of social media Toxic comments using Machine learning models

Poojitha, K., Charish, A. Sai, Reddy, M. Arun Kuamr, Ayyasamy, S.

arXiv.org Artificial IntelligenceApr-14-2023

The abstract outlines the problem of toxic comments on social media platforms, where individuals use disrespectful, abusive, and unreasonable language that can drive users away from discussions. This behavior is referred to as anti-social behavior, which occurs during online debates, comments, and fights. The comments containing explicit language can be classified into various categories, such as toxic, severe toxic, obscene, threat, insult, and identity hate. This behavior leads to online harassment and cyberbullying, which forces individuals to stop expressing their opinions and ideas. To protect users from offensive language, companies have started flagging comments and blocking users. The abstract proposes to create a classifier using an Lstm-cnn model that can differentiate between toxic and non-toxic comments with high accuracy. The classifier can help organizations examine the toxicity of the comment section better.

artificial intelligence, machine learning, social media, (20 more...)

arXiv.org Artificial Intelligence

2304.06934

Country:

North America > United States > Mississippi (0.04)
Asia > Thailand (0.04)
North America > United States > District of Columbia > Washington (0.04)
Europe > Switzerland > Geneva > Geneva (0.04)

Genre:

Research Report > Experimental Study (0.94)
Research Report > New Finding (0.93)

Industry:

Information Technology (0.48)
Health & Medicine (0.46)
Law Enforcement & Public Safety (0.34)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

GitHub - unitaryai/detoxify: Trained models & code to predict toxic comments on all 3 Jigsaw Toxic Comment Challenges. Built using Pytorch Lightning and 🤗 Transformers. For access to our API, please email us at contact@unitary.ai.

#artificialintelligenceMay-9-2022, 17:46:29 GMT

Trained models & code to predict toxic comments on all 3 Jigsaw Toxic Comment Challenges. Built using ⚡ Pytorch Lightning and 🤗 Transformers. For access to our API, please email us at contact@unitary.ai. - GitHub - unitaryai/detoxify: Trained models & code to predict toxic comments on all 3 Jigsaw Toxic Comment Challenges. Built using ⚡ Pytorch Lightning and 🤗 Transformers. For access to our API, please email us at contact@unitary.ai.

jigsaw toxic comment challenge, model & code, predict toxic comment, (11 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.82)

Add feedback

Global Thread-Level Inference for Comment Classification in Community Question Answering

Joty, Shafiq, Barrón-Cedeño, Alberto, Martino, Giovanni Da San, Filice, Simone, Màrquez, Lluís, Moschitti, Alessandro, Nakov, Preslav

arXiv.org Artificial IntelligenceNov-20-2019

Community question answering, a recent evolution of question answering in the Web context, allows a user to quickly consult the opinion of a number of people on a particular topic, thus taking advantage of the wisdom of the crowd. Here we try to help the user by deciding automatically which answers are good and which are bad for a given question. In particular, we focus on exploiting the output structure at the thread level in order to make more consistent global decisions. More specifically, we exploit the relations between pairs of comments at any distance in the thread, which we incorporate in a graph-cut and in an ILP frameworks. We evaluated our approach on the benchmark dataset of SemEval-2015 Task 3. Results improved over the state of the art, confirming the importance of using thread level information.

classifier, information, proceedings, (13 more...)

arXiv.org Artificial Intelligence

1911.08755

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Colorado > Denver County > Denver (0.05)
Europe > Middle East > Cyprus > Nicosia > Nicosia (0.05)
(7 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Is preprocessing of text really worth your time for online comment classification?

Mohammad, Fahim

arXiv.org Artificial IntelligenceJun-7-2018

A large proportion of online comments present on public domains are constructive, however a significant proportion are toxic in nature. The comments contain lot of typos which increases the number of features manifold, making the ML model difficult to train. Considering the fact that the data scientists spend approximately 80% of their time in collecting, cleaning and organizing their data [1], we explored how much effort should we invest in the preprocessing (transformation) of raw comments before feeding it to the state-of-the-art classification models. With the help of four models on Jigsaw toxic comment classification data, we demonstrated that the training of model without any transformation produce relatively decent model. Applying even basic transformations, in some cases, lead to worse performance and should be applied with caution.

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

1806.02908

Genre: Research Report (1.00)

Industry:

Education > Educational Setting > Online (0.86)
Information Technology > Services (0.68)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)

Add feedback