Support Vector Machines
emojiSpace: Spatial Representation of Emojis
Mostafavi, Moeen, Varnosfaderani, Mahsa Pahlavikhah, Nikseresht, Fateme, Mansouri, Seyed Ahmad
In the absence of nonverbal cues during messaging communication, users express part of their emotions using emojis. Thus, having emojis in the vocabulary of text messaging language models can significantly improve many natural language processing (NLP) applications such as online communication analysis. On the other hand, word embedding models are usually trained on a very large corpus of text such as Wikipedia or Google News datasets that include very few samples with emojis. In this study, we create emojiSpace, which is a combined word-emoji embedding using the word2vec model from the Genism library in Python. We trained emojiSpace on a corpus of more than 4 billion tweets and evaluated it by implementing sentiment analysis on a Twitter dataset containing more than 67 million tweets as an extrinsic task. For this task, we compared the performance of two different classifiers of random forest (RF) and linear support vector machine (SVM). For evaluation, we compared emojiSpace performance with two other pre-trained embeddings and demonstrated that emojiSpace outperforms both.
Examining Uniqueness and Permanence of the WAY EEG GAL dataset toward User Authentication
This study evaluates the discriminating capacity (uniqueness) of the EEG data from the WAY EEG GAL public dataset to authenticate individuals against one another as well as its permanence. In addition to the EEG data, Luciw et al. provide EMG (Electromyography), and kinematics data for engineers and researchers to utilize WAY EEG GAL for further studies. However, evaluating the EMG and kinematics data is outside the scope of this study. The goal of the state-of-the-art is to determine whether EEG data can be utilized to control prosthetic devices. On the other hand, this study aims to evaluate the separability of individuals through EEG data to perform user authentication. A feature importance algorithm is utilized to select the best features for each user to authenticate them against all others. The authentication platform implemented for this study is based on Machine Learning models/classifiers. As an initial test, two pilot studies are performed using Linear Discriminant Analysis (LDA) and Support Vector Machine (SVM) to observe the learning trends of the models by multi-labeling the EEG dataset. Utilizing kNN first as the classifier for user authentication, accuracy around 75% is observed. Thereafter to improve the performance both linear and non-linear SVMs are used to perform classification. The overall average accuracies of 85.18% and 86.92% are achieved using linear and non-linear SVMs respectively. In addition to accuracy, F1 scores are also calculated. The overall average F1 score of 87.51% and 88.94% are achieved for linear and non-linear SVMs respectively. Beyond the overall performance, high performing individuals with 95.3% accuracy (95.3% F1 score) using linear SVM and 97.4% accuracy (97.3% F1 score) using non-linear SVM are also observed.
Machine Learning
Machine learning is the technique that enables a machine to learn from data, improve performance from experiences and predict things without being explicitly programmed. Machine learning is a specialized technology that comes under Artificial Intelligence and it also includes self-driving cars, image recognition, and speech recognition. It is different from traditional programming, In machine learning when we pass data and output as the input it creates the model and gives the desired algorithm. Supervised Learning is a machine learning technique that can only process the labeled data. The model can be created by labeled data to know the datasets and to know about each data while training the model.
EEG-based Emotion Recognition Using Multiple Kernel Learning - Machine Intelligence Research
Emotion recognition based on electroencephalography (EEG) has a wide range of applications and has great potential value, so it has received increasing attention from academia and industry in recent years. Meanwhile, multiple kernel learning (MKL) has also been favored by researchers for its data-driven convenience and high accuracy. However, there is little research on MKL in EEG-based emotion recognition. Therefore, this paper is dedicated to exploring the application of MKL methods in the field of EEG emotion recognition and promoting the application of MKL methods in EEG emotion recognition. Thus, we proposed a support vector machine (SVM) classifier based on the MKL algorithm EasyMKL to investigate the feasibility of MKL algorithms in EEG-based emotion recognition problems.
A Machine Learning Analysis of Impact of the Covid-19 Pandemic on Alcohol Consumption Habit Changes Among Healthcare Workers in the U.S
In this paper, we discuss the impact of the Covid-19 pandemic on alcohol consumption habit changes among healthcare workers in the United States. We utilize multiple supervised and unsupervised machine learning methods and models such as Decision Trees, Logistic Regression, Naive Bayes classifier, k-Nearest Neighbors, Support Vector Machines, Multilayer perceptron, XGBoost, CatBoost, LightGBM, Chi-Squared Test and mutual information method on a mental health survey data obtained from the University of Michigan Inter-University Consortium for Political and Social Research to find out relationships between COVID-19 related negative effects and alcohol consumption habit changes among healthcare workers. Our findings suggest that COVID-19-related school closures, COVID-19-related work schedule changes and COVID-related news exposure may lead to an increase in alcohol use among healthcare workers in the United States.
Traffic Congestion Prediction Using Machine Learning Techniques
Yasir, Rafed Muhammad, Nower, Dr. Naushin, Shoyaib, Dr. Mohammad
The prediction of traffic congestion can serve a crucial role in making future decisions. Although many studies have been conducted regarding congestion, most of these could not cover all the important factors (e.g., weather conditions). We proposed a prediction model for traffic congestion that can predict congestion based on day, time and several weather data (e.g., temperature, humidity). To evaluate our model, it has been tested against the traffic data of New Delhi. With this model, congestion of a road can be predicted one week ahead with an average RMSE of 1.12. Therefore, this model can be used to take preventive measure beforehand.
Use and Misuse of Machine Learning in Anthropology
Calder, Jeff, Coil, Reed, Melton, Annie, Olver, Peter J., Tostevin, Gilbert, Yezzi-Woodley, Katrina
Machine learning (ML), being now widely accessible to the research community at large, has fostered a proliferation of new and striking applications of these emergent mathematical techniques across a wide range of disciplines. In this paper, we will focus on a particular case study: the field of paleoanthropology, which seeks to understand the evolution of the human species based on biological and cultural evidence. As we will show, the easy availability of ML algorithms and lack of expertise on their proper use among the anthropological research community has led to foundational misapplications that have appeared throughout the literature. The resulting unreliable results not only undermine efforts to legitimately incorporate ML into anthropological research, but produce potentially faulty understandings about our human evolutionary and behavioral past. The aim of this paper is to provide a brief introduction to some of the ways in which ML has been applied within paleoanthropology; we also include a survey of some basic ML algorithms for those who are not fully conversant with the field, which remains under active development. We discuss a series of missteps, errors, and violations of correct protocols of ML methods that appear disconcertingly often within the accumulating body of anthropological literature. These mistakes include use of outdated algorithms and practices; inappropriate train/test splits, sample composition, and textual explanations; as well as an absence of transparency due to the lack of data/code sharing, and the subsequent limitations imposed on independent replication. We assert that expanding samples, sharing data and code, re-evaluating approaches to peer review, and, most importantly, developing interdisciplinary teams that include experts in ML are all necessary for progress in future research incorporating ML within anthropology.
Artificial Intelligence-Based Analytics for Impacts of COVID-19 and Online Learning on College Students' Mental Health
Rezapour, Mostafa, Elmshaeuser, Scott K.
COVID-19, the disease caused by the novel coronavirus (SARS-CoV-2), first emerged in Wuhan, China late in December 2019. Not long after, the virus spread worldwide and was declared a pandemic by the World Health Organization in March 2020. This caused many changes around the world and in the United States, including an educational shift towards online learning. In this paper, we seek to understand how the COVID-19 pandemic and increase in online learning impact college students' emotional wellbeing. We use several machine learning and statistical models to analyze data collected by the Faculty of Public Administration at the University of Ljubljana, Slovenia in conjunction with an international consortium of universities, other higher education institutions, and students' associations. Our results indicate that features related to students' academic life have the largest impact on their emotional wellbeing. Other important factors include students' satisfaction with their university's and government's handling of the pandemic as well as students' financial security.
PLSSVM: A (multi-)GPGPU-accelerated Least Squares Support Vector Machine
Van Craen, Alexander, Breyer, Marcel, Pflüger, Dirk
Machine learning algorithms must be able to efficiently cope with massive data sets. Therefore, they have to scale well on any modern system and be able to exploit the computing power of accelerators independent of their vendor. In the field of supervised learning, Support Vector Machines (SVMs) are widely used. However, even modern and optimized implementations such as LIBSVM or ThunderSVM do not scale well for large non-trivial dense data sets on cutting-edge hardware: Most SVM implementations are based on Sequential Minimal Optimization, an optimized though inherent sequential algorithm. Hence, they are not well-suited for highly parallel GPUs. Furthermore, we are not aware of a performance portable implementation that supports CPUs and GPUs from different vendors. We have developed the PLSSVM library to solve both issues. First, we resort to the formulation of the SVM as a least squares problem. Training an SVM then boils down to solving a system of linear equations for which highly parallel algorithms are known. Second, we provide a hardware independent yet efficient implementation: PLSSVM uses different interchangeable backends--OpenMP, CUDA, OpenCL, SYCL--supporting modern hardware from various vendors like NVIDIA, AMD, or Intel on multiple GPUs. PLSSVM can be used as a drop-in replacement for LIBSVM. We observe a speedup on CPUs of up to 10 compared to LIBSVM and on GPUs of up to 14 compared to ThunderSVM. Our implementation scales on many-core CPUs with a parallel speedup of 74.7 on up to 256 CPU threads and on multiple GPUs with a parallel speedup of 3.71 on four GPUs. The code, utility scripts, and documentation are all available on GitHub: https://github.com/SC-SGS/PLSSVM.
Fault Detection for Non-Condensing Boilers using Simulated Building Automation System Sensor Data
Shohet, Rony, Kandil, Mohamed, Wang, Y., McArthur, J. J.
Building performance has been shown to degrade significantly after commissioning, resulting in increased energy consumption and associated greenhouse gas emissions. Continuous Commissioning using existing sensor networks and IoT devices has the potential to minimize this waste by continually identifying system degradation and re-tuning control strategies to adapt to real building performance. Due to its significant contribution to greenhouse gas emissions, the performance of gas boiler systems for building heating is critical. A review of boiler performance studies has been used to develop a set of common faults and degraded performance conditions, which have been integrated into a MATLAB/Simulink emulator. This resulted in a labeled dataset with approximately 10,000 simulations of steady-state performance for each of 14 non-condensing boilers. The collected data is used for training and testing fault classification using K-nearest neighbour, Decision tree, Random Forest, and Support Vector Machines. The results show that the Support Vector Machines method gave the best prediction accuracy, consistently exceeding 90%, and generalization across multiple boilers is not possible due to low classification accuracy.