Goto

Collaborating Authors

 feature extraction technique


Celebrity Profiling on Short Urdu Text using Twitter Followers' Feed

arXiv.org Artificial Intelligence

Social media has become an essential part of the digital age, serving as a platform for communication, interaction, and information sharing. Celebrities are among the most active users and often reveal aspects of their personal and professional lives through online posts. Platforms such as Twitter provide an opportunity to analyze language and behavior for understanding demographic and social patterns. Since followers frequently share linguistic traits and interests with the celebrities they follow, textual data from followers can be used to predict celebrity demographics. However, most existing research in this field has focused on English and other high-resource languages, leaving Urdu largely unexplored. This study applies modern machine learning and deep learning techniques to the problem of celebrity profiling in Urdu. A dataset of short Urdu tweets from followers of subcontinent celebrities was collected and preprocessed. Multiple algorithms were trained and compared, including Logistic Regression, Support Vector Machines, Random Forests, Convolutional Neural Networks, and Long Short-Term Memory networks. The models were evaluated using accuracy, precision, recall, F1-score, and cumulative rank (cRank). The best performance was achieved for gender prediction with a cRank of 0.65 and an accuracy of 0.65, followed by moderate results for age, profession, and fame prediction. These results demonstrate that follower-based linguistic features can be effectively leveraged using machine learning and neural approaches for demographic prediction in Urdu, a low-resource language.


Relative-Absolute Fusion: Rethinking Feature Extraction in Image-Based Iterative Method Selection for Solving Sparse Linear Systems

arXiv.org Artificial Intelligence

Iterative method selection is crucial for solving sparse linear systems because these methods inherently lack robustness. Though image-based selection approaches have shown promise, their feature extraction techniques might encode distinct matrices into identical image representations, leading to the same selection and suboptimal method. In this paper, we introduce RAF (Relative-Absolute Fusion), an efficient feature extraction technique to enhance image-based selection approaches. By simultaneously extracting and fusing image representations as relative features with corresponding numerical values as absolute features, RAF achieves comprehensive matrix representations that prevent feature ambiguity across distinct matrices, thus improving selection accuracy and unlocking the potential of image-based selection approaches. We conducted comprehensive evaluations of RAF on SuiteSparse and our developed BMCMat (Balanced Multi-Classification Matrix dataset), demonstrating solution time reductions of 0.08s-0.29s for sparse linear systems, which is 5.86%-11.50% faster than conventional image-based selection approaches and achieves state-of-the-art (SOTA) performance. BMCMat is available at https://github.com/zkqq/BMCMat.


Social Media Sentiments Analysis on the July Revolution in Bangladesh: A Hybrid Transformer Based Machine Learning Approach

arXiv.org Artificial Intelligence

The July Revolution in Bangladesh marked a significant student-led mass uprising, uniting people across the nation to demand justice, accountability, and systemic reform. Social media platforms played a pivotal role in amplifying public sentiment and shaping discourse during this historic mass uprising. In this study, we present a hybrid transformer-based sentiment analysis framework to decode public opinion expressed in social media comments during and after the revolution. We used a brand new dataset of 4,200 Bangla comments collected from social media. The framework employs advanced transformer-based feature extraction techniques, including BanglaBERT, mBERT, XLM-RoBERTa, and the proposed hybrid XMB-BERT, to capture nuanced patterns in textual data. Principle Component Analysis (PCA) were utilized for dimensionality reduction to enhance computational efficiency. We explored eleven traditional and advanced machine learning classifiers for identifying sentiments. The proposed hybrid XMB-BERT with the voting classifier achieved an exceptional accuracy of 83.7% and outperform other model classifier combinations. This study underscores the potential of machine learning techniques to analyze social sentiment in low-resource languages like Bangla.


Analysis of child development facts and myths using text mining techniques and classification models

arXiv.org Artificial Intelligence

The rapid dissemination of misinformation on the internet complicates the decision-making process for individuals seeking reliable information, particularly parents researching child development topics. This misinformation can lead to adverse consequences, such as inappropriate treatment of children based on myths. While previous research has utilized text-mining techniques to predict child abuse cases, there has been a gap in the analysis of child development myths and facts. This study addresses this gap by applying text mining techniques and classification models to distinguish between myths and facts about child development, leveraging newly gathered data from publicly available websites. The research methodology involved several stages. First, text mining techniques were employed to pre-process the data, ensuring enhanced accuracy. Subsequently, the structured data was analysed using six robust Machine Learning (ML) classifiers and one Deep Learning (DL) model, with two feature extraction techniques applied to assess their performance across three different training-testing splits. To ensure the reliability of the results, cross-validation was performed using both k-fold and leave-one-out methods. Among the classification models tested, Logistic Regression (LR) demonstrated the highest accuracy, achieving a 90% accuracy with the Bag-of-Words (BoW) feature extraction technique. LR stands out for its exceptional speed and efficiency, maintaining low testing time per statement (0.97 microseconds). These findings suggest that LR, when combined with BoW, is effective in accurately classifying child development information, thus providing a valuable tool for combating misinformation and assisting parents in making informed decisions.


Leveraging Pre-trained CNNs for Efficient Feature Extraction in Rice Leaf Disease Classification

arXiv.org Artificial Intelligence

Rice disease classification is a critical task in agricultural research, and in this study, we rigorously evaluate the impact of integrating feature extraction methodologies within pre-trained convolutional neural networks (CNNs). Initial investigations into baseline models, devoid of feature extraction, revealed commendable performance with ResNet-50 and ResNet-101 achieving accuracies of 91% and 92%, respectively. Subsequent integration of Histogram of Oriented Gradients (HOG) yielded substantial improvements across architectures, notably propelling the accuracy of EfficientNet-B7 from 92\% to an impressive 97%. Conversely, the application of Local Binary Patterns (LBP) demonstrated more conservative performance enhancements. Moreover, employing Gradient-weighted Class Activation Mapping (Grad-CAM) unveiled that HOG integration resulted in heightened attention to disease-specific features, corroborating the performance enhancements observed. Visual representations further validated HOG's notable influence, showcasing a discernible surge in accuracy across epochs due to focused attention on disease-affected regions. These results underscore the pivotal role of feature extraction, particularly HOG, in refining representations and bolstering classification accuracy. The study's significant highlight was the achievement of 97% accuracy with EfficientNet-B7 employing HOG and Grad-CAM, a noteworthy advancement in optimizing pre-trained CNN-based rice disease identification systems. The findings advocate for the strategic integration of advanced feature extraction techniques with cutting-edge pre-trained CNN architectures, presenting a promising avenue for substantially augmenting the precision and effectiveness of image-based disease classification systems in agricultural contexts.


A Machine Learning Ensemble Model for the Detection of Cyberbullying

arXiv.org Artificial Intelligence

The pervasive use of social media platforms, such as Facebook, Instagram, and X, has significantly amplified our electronic interconnectedness. Moreover, these platforms are now easily accessible from any location at any given time. However, the increased popularity of social media has also led to cyberbullying.It is imperative to address the need for finding, monitoring, and mitigating cyberbullying posts on social media platforms. Motivated by this necessity, we present this paper to contribute to developing an automated system for detecting binary labels of aggressive tweets.Our study has demonstrated remarkable performance compared to previous experiments on the same dataset. We employed the stacking ensemble machine learning method, utilizing four various feature extraction techniques to optimize performance within the stacking ensemble learning framework. Combining five machine learning algorithms,Decision Trees, Random Forest, Linear Support Vector Classification, Logistic Regression, and K-Nearest Neighbors into an ensemble method, we achieved superior results compared to traditional machine learning classifier models. The stacking classifier achieved a high accuracy rate of 94.00%, outperforming traditional machine learning models and surpassing the results of prior experiments that utilized the same dataset. NTRODUCTION Today, social networking sites play a significant role in our daily lives. We use social media for various communications, encompassing entertainment, education, personal development, and the workplace. The revolutionary nature of these platforms has made it much easier to connect with people across long distances [1]. Technological advancements have transformed the way we communicate, share information, and interact with communities globally [2].


Unmasking Falsehoods in Reviews: An Exploration of NLP Techniques

arXiv.org Artificial Intelligence

In the contemporary digital landscape, online reviews have become an indispensable tool for promoting products and services across various businesses. Marketers, advertisers, and online businesses have found incentives to create deceptive positive reviews for their products and negative reviews for their competitors' offerings. As a result, the writing of deceptive reviews has become an unavoidable practice for businesses seeking to promote themselves or undermine their rivals. Detecting such deceptive reviews has become an intense and ongoing area of research. This research paper proposes a machine learning model to identify deceptive reviews, with a particular focus on restaurants. This study delves into the performance of numerous experiments conducted on a dataset of restaurant reviews known as the Deceptive Opinion Spam Corpus. To accomplish this, an n-gram model and max features are developed to effectively identify deceptive content, particularly focusing on fake reviews. A benchmark study is undertaken to explore the performance of two different feature extraction techniques, which are then coupled with five distinct machine learning classification algorithms. The experimental results reveal that the passive aggressive classifier stands out among the various algorithms, showcasing the highest accuracy not only in text classification but also in identifying fake reviews. Moreover, the research delves into data augmentation and implements various deep learning techniques to further enhance the process of detecting deceptive reviews. The findings shed light on the efficacy of the proposed machine learning approach and offer valuable insights into dealing with deceptive reviews in the realm of online businesses.


Evaluating raw waveforms with deep learning frameworks for speech emotion recognition

arXiv.org Artificial Intelligence

Speech emotion recognition is a challenging task in speech processing field. For this reason, feature extraction process has a crucial importance to demonstrate and process the speech signals. In this work, we represent a model, which feeds raw audio files directly into the deep neural networks without any feature extraction stage for the recognition of emotions utilizing six different data sets, EMO-DB, RAVDESS, TESS, CREMA, SAVEE, and TESS+RAVDESS. To demonstrate the contribution of proposed model, the performance of traditional feature extraction techniques namely, mel-scale spectogram, mel-frequency cepstral coefficients, are blended with machine learning algorithms, ensemble learning methods, deep and hybrid deep learning techniques. Support vector machine, decision tree, naive Bayes, random forests models are evaluated as machine learning algorithms while majority voting and stacking methods are assessed as ensemble learning techniques. Moreover, convolutional neural networks, long short-term memory networks, and hybrid CNN- LSTM model are evaluated as deep learning techniques and compared with machine learning and ensemble learning methods. To demonstrate the effectiveness of proposed model, the comparison with state-of-the-art studies are carried out. Based on the experiment results, CNN model excels existent approaches with 95.86% of accuracy for TESS+RAVDESS data set using raw audio files, thence determining the new state-of-the-art. The proposed model performs 90.34% of accuracy for EMO-DB with CNN model, 90.42% of accuracy for RAVDESS with CNN model, 99.48% of accuracy for TESS with LSTM model, 69.72% of accuracy for CREMA with CNN model, 85.76% of accuracy for SAVEE with CNN model in speaker-independent audio categorization problems.


Zero-day DDoS Attack Detection

arXiv.org Artificial Intelligence

The ability to detect zero-day (novel) attacks has become essential in the network security industry. Due to ever evolving attack signatures, existing network intrusion detection systems often fail to detect these threats. This project aims to solve the task of detecting zero-day DDoS (distributed denial-of-service) attacks by utilizing network traffic that is captured before entering a private network. Modern feature extraction techniques are used in conjunction with neural networks to determine if a network packet is either benign or malicious.


Minimal Feature Analysis for Isolated Digit Recognition for varying encoding rates in noisy environments

arXiv.org Artificial Intelligence

This research work is about recent development made in speech recognition. In this research work, analysis of isolated digit recognition in the presence of different bit rates and at different noise levels has been performed. This research work has been carried using audacity and HTK toolkit. Hidden Markov Model (HMM) is the recognition model which was used to perform this experiment. The feature extraction techniques used are Mel Frequency Cepstrum coefficient (MFCC), Linear Predictive Coding (LPC), perceptual linear predictive (PLP), mel spectrum (MELSPEC), filter bank (FBANK). There were three types of different noise levels which have been considered for testing of data. These include random noise, fan noise and random noise in real time environment. This was done to analyse the best environment which can used for real time applications. Further, five different types of commonly used bit rates at different sampling rates were considered to find out the most optimum bit rate.