Goto

Collaborating Authors

 Support Vector Machines


Indoor PM2.5 forecasting and the association with outdoor air pollution: a modelling study based on sensor data in Australia

arXiv.org Artificial Intelligence

Exposure to poor indoor air quality poses significant health risks, necessitating thorough assessment to mitigate associated dangers. This study aims to predict hourly indoor fine particulate matter (PM2.5) concentrations and investigate their correlation with outdoor PM2.5 levels across 24 distinct buildings in Australia. Indoor air quality data were gathered from 91 monitoring sensors in eight Australian cities spanning 2019 to 2022. Employing an innovative three-stage deep ensemble machine learning framework (DEML), comprising three base models (Support Vector Machine, Random Forest, and eXtreme Gradient Boosting) and two meta-models (Random Forest and Generalized Linear Model), hourly indoor PM2.5 concentrations were predicted. The model's accuracy was evaluated using a rolling windows approach, comparing its performance against three benchmark algorithms (SVM, RF, and XGBoost). Additionally, a correlation analysis assessed the relationship between indoor and outdoor PM2.5 concentrations. Results indicate that the DEML model consistently outperformed benchmark models, achieving an R2 ranging from 0.63 to 0.99 and RMSE from 0.01 to 0.663 mg/m3 for most sensors. Notably, outdoor PM2.5 concentrations significantly impacted indoor air quality, particularly evident during events like bushfires. This study underscores the importance of accurate indoor air quality prediction, crucial for developing location-specific early warning systems and informing effective interventions. By promoting protective behaviors, these efforts contribute to enhanced public health outcomes.


Analyzing Language Bias Between French and English in Conventional Multilingual Sentiment Analysis Models

arXiv.org Artificial Intelligence

Inspired by the 'Bias Considerations in Bilingual Natural Language Processing' report by Statistics Canada, this study delves into potential biases in multilingual sentiment analysis between English and French. Given a 50-50 dataset of French and English, we aim to determine if there exists a language bias and explore how the incorporation of more diverse datasets in the future might affect the equity of multilingual Natural Language Processing (NLP) systems. By employing Support Vector Machine (SVM) and Naive Bayes models on three balanced datasets, we reveal potential biases in multilingual sentiment classification. Utilizing Fairlearn, a tool for assessing bias in machine learning models, our findings indicate nuanced outcomes. With French data outperforming English across accuracy, recall, and F1 score metrics in both models, hinting at a language bias favoring French. However, Fairlearn's metrics suggest that the SVM approaches equitable levels with a demographic parity ratio of 0.963, 0.989, and 0.985 for the three separate datasets, indicating near-equitable treatment across languages. In contrast, Naive Bayes demonstrates greater disparities, evidenced by a demographic parity ratio of 0.813, 0.908, and 0.961. These findings reveal the importance of developing equitable multilingual NLP systems, particularly as we anticipate the inclusion of more datasets in various languages in the future.


Detecting 5G Narrowband Jammers with CNN, k-nearest Neighbors, and Support Vector Machines

arXiv.org Artificial Intelligence

5G cellular networks are particularly vulnerable against narrowband jammers that target specific control sub-channels in the radio signal. One mitigation approach is to detect such jamming attacks with an online observation system, based on machine learning. We propose to detect jamming at the physical layer with a pre-trained machine learning model that performs binary classification. Based on data from an experimental 5G network, we study the performance of different classification models. A convolutional neural network will be compared to support vector machines and k-nearest neighbors, where the last two methods are combined with principal component analysis. The obtained results show substantial differences in terms of classification accuracy and computation time.


Stressor Type Matters! -- Exploring Factors Influencing Cross-Dataset Generalizability of Physiological Stress Detection

arXiv.org Artificial Intelligence

Automatic stress detection using heart rate variability (HRV) features has gained significant traction as it utilizes unobtrusive wearable sensors measuring signals like electrocardiogram (ECG) or blood volume pulse (BVP). However, detecting stress through such physiological signals presents a considerable challenge owing to the variations in recorded signals influenced by factors, such as perceived stress intensity and measurement devices. Consequently, stress detection models developed on one dataset may perform poorly on unseen data collected under different conditions. To address this challenge, this study explores the generalizability of machine learning models trained on HRV features for binary stress detection. Our goal extends beyond evaluating generalization performance; we aim to identify the characteristics of datasets that have the most significant influence on generalizability. We leverage four publicly available stress datasets (WESAD, SWELL-KW, ForDigitStress, VerBIO) that vary in at least one of the characteristics such as stress elicitation techniques, stress intensity, and sensor devices. Employing a cross-dataset evaluation approach, we explore which of these characteristics strongly influence model generalizability. Our findings reveal a crucial factor affecting model generalizability: stressor type. Models achieved good performance across datasets when the type of stressor (e.g., social stress in our case) remains consistent. Factors like stress intensity or brand of the measurement device had minimal impact on cross-dataset performance. Based on our findings, we recommend matching the stressor type when deploying HRV-based stress models in new environments. To the best of our knowledge, this is the first study to systematically investigate factors influencing the cross-dataset applicability of HRV-based stress models.


Analyzing Emotional Trends from X platform using SenticNet: A Comparative Analysis with Cryptocurrency Price

arXiv.org Artificial Intelligence

This study delves into the relationship between emotional trends from X platform data and the market dynamics of well-known cryptocurrencies Cardano, Binance, Fantom, Matic, and Ripple over the period from October 2022 to March 2023. Leveraging SenticNet, we identified emotions like Fear and Anxiety, Rage and Anger, Grief and Sadness, Delight and Pleasantness, Enthusiasm and Eagerness, and Delight and Joy. Following data extraction, we segmented each month into bi-weekly intervals, replicating this process for price data obtained from Finance-Yahoo. Consequently, a comparative analysis was conducted, establishing connections between emotional trends observed across bi-weekly intervals and cryptocurrency prices, uncovering significant correlations between emotional sentiments and coin valuations.


Screening of BindingDB database ligands against EGFR, HER2, Estrogen, Progesterone and NF-kB receptors based on machine learning and molecular docking

arXiv.org Artificial Intelligence

Breast cancer, the second most prevalent cancer among women worldwide, necessitates the exploration of novel therapeutic approaches. To target the four subgroups of breast cancer "hormone receptor-positive and HER2-negative, hormone receptor-positive and HER2-positive, hormone receptor-negative and HER2-positive, and hormone receptor-negative and HER2-negative" it is crucial to inhibit specific targets such as EGFR, HER2, ER, NF-kB, and PR. In this study, we evaluated various methods for binary and multiclass classification. Among them, the GA-SVM-SVM:GA-SVM-SVM model was selected with an accuracy of 0.74, an F1-score of 0.73, and an AUC of 0.94 for virtual screening of ligands from the BindingDB database. This model successfully identified 4454, 803, 438, and 378 ligands with over 90% precision in both active/inactive and target prediction for the classes of EGFR+HER2, ER, NF-kB, and PR, respectively, from the BindingDB database. Based on to the selected ligands, we created a dendrogram that categorizes different ligands based on their targets. This dendrogram aims to facilitate the exploration of chemical space for various therapeutic targets. Ligands that surpassed a 90% threshold in the product of activity probability and correct target selection probability were chosen for further investigation using molecular docking. The binding energy range for these ligands against their respective targets was calculated to be between -15 and -5 kcal/mol. Finally, based on general and common rules in medicinal chemistry, we selected 2, 3, 3, and 8 new ligands with high priority for further studies in the EGFR+HER2, ER, NF-kB, and PR classes, respectively.


Sup3r: A Semi-Supervised Algorithm for increasing Sparsity, Stability, and Separability in Hierarchy Of Time-Surfaces architectures

arXiv.org Artificial Intelligence

Hierarchy Of Time-Surfaces is a neuromorphic algorithm used to extract features from patterns of events [1]. This is possible thanks to a type of representation called time-surface or time vector, where events are interpolated by exponential decay kernels and collected to represent relative time differences between the activation of units in the network. Time surfaces are one of the most common representations in the neuromorphic field since they allow to interface event data with traditional machine learning and computer vision algorithms [2, 3]. In HOTS, time surfaces are clustered together using algorithms like k-means to extract common activity patterns, and layers of units are built by considering each centroid as a neuron that can emit a new event when an input time surface is assigned to it. For this reason, HOTS shares many points in common with bag-of-words or bag-of-features algorithms[4]. For instance, HOTS requires an external classifier on histograms of features to classify information. Similarly to bag-of-words algorithms, HOTS classifiers are histograms that accumulate features over a given temporal window to produce an input vector to traditional machine learning algorithms like Support Vector Machines and Multi-Layer Perceptrons[5, 6, 1, 7]. This approach limits compatibility with neuromorphic hardware and can nullify latency and energy efficiency advantages that are found in neuromorphic systems. Compared to Spiking Neural Networks (SNNs) trained with backpropagation through time, HOTS lags in accuracy [8, 9, 10, 1].


Diagnosis of Parkinson's Disease Using EEG Signals and Machine Learning Techniques: A Comprehensive Study

arXiv.org Artificial Intelligence

Parkinson's disease is a widespread neurodegenerative condition necessitating early diagnosis for effective intervention. This paper introduces an innovative method for diagnosing Parkinson's disease through the analysis of human EEG signals, employing a Support Vector Machine (SVM) classification model. this research presents novel contributions to enhance diagnostic accuracy and reliability. Our approach incorporates a comprehensive review of EEG signal analysis techniques and machine learning methods. Drawing from recent studies, we have engineered an advanced SVM-based model optimized for Parkinson's disease diagnosis. Utilizing cutting-edge feature engineering, extensive hyperparameter tuning, and kernel selection, our method achieves not only heightened diagnostic accuracy but also emphasizes model interpretability, catering to both clinicians and researchers. Moreover, ethical concerns in healthcare machine learning, such as data privacy and biases, are conscientiously addressed. We assess our method's performance through experiments on a diverse dataset comprising EEG recordings from Parkinson's disease patients and healthy controls, demonstrating significantly improved diagnostic accuracy compared to conventional techniques. In conclusion, this paper introduces an innovative SVM-based approach for diagnosing Parkinson's disease from human EEG signals. Building upon the IEEE framework and previous research, its novelty lies in the capacity to enhance diagnostic accuracy while upholding interpretability and ethical considerations for practical healthcare applications. These advances promise to revolutionize early Parkinson's disease detection and management, ultimately contributing to enhanced patient outcomes and quality of life.


Balancing Spectral, Temporal and Spatial Information for EEG-based Alzheimer's Disease Classification

arXiv.org Artificial Intelligence

The prospect of future treatment warrants the development of cost-effective screening for Alzheimer's disease (AD). A promising candidate in this regard is electroencephalography (EEG), as it is one of the most economic imaging modalities. Recent efforts in EEG analysis have shifted towards leveraging spatial information, employing novel frameworks such as graph signal processing or graph neural networks. Here, we investigate the importance of spatial information relative to spectral or temporal information by varying the proportion of each dimension for AD classification. To do so, we systematically test various dimension resolution configurations on two routine EEG datasets. Our findings show that spatial information is more important than temporal information and equally valuable as spectral information. On the larger second dataset, substituting spectral with spatial information even led to an increase of 1.1% in accuracy, which emphasises the importance of spatial information for EEG-based AD classification. We argue that our resolution-based feature extraction has the potential to improve AD classification specifically, and multivariate signal classification generally.


A Comparative Study on Enhancing Prediction in Social Network Advertisement through Data Augmentation

arXiv.org Artificial Intelligence

In the ever-evolving landscape of social network advertising, the volume and accuracy of data play a critical role in the performance of predictive models. However, the development of robust predictive algorithms is often hampered by the limited size and potential bias present in real-world datasets. This study presents and explores a generative augmentation framework of social network advertising data. Our framework explores three generative models for data augmentation - Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and Gaussian Mixture Models (GMMs) - to enrich data availability and diversity in the context of social network advertising analytics effectiveness. By performing synthetic extensions of the feature space, we find that through data augmentation, the performance of various classifiers has been quantitatively improved. Furthermore, we compare the relative performance gains brought by each data augmentation technique, providing insights for practitioners to select appropriate techniques to enhance model performance. This paper contributes to the literature by showing that synthetic data augmentation alleviates the limitations imposed by small or imbalanced datasets in the field of social network advertising. At the same time, this article also provides a comparative perspective on the practicality of different data augmentation methods, thereby guiding practitioners to choose appropriate techniques to enhance model performance.