Support Vector Machines
Analyzing Astronomical Data with Machine Learning Techniques - SpaceRef
Classification is a popular task in the field of Machine Learning (ML) and Artificial Intelligence (AI), and it happens when outputs are categorical variables. There are a wide variety of models that attempts to draw some conclusions from observed values, so classification algorithms predict categorical class labels and use them in classifying new data. Classification is a popular task in the field of Machine Learning (ML) and Artificial Intelligence (AI), and it happens when outputs are categorical variables. There are a wide variety of models that attempts to draw some conclusions from observed values, so classification algorithms predict categorical class labels and use them in classifying new data. Popular classification models including logistic regression, decision tree, random forest, Support Vector Machine (SVM), multilayer perceptron, Naive Bayes, and neural networks have proven to be efficient and accurate applied to many industrial and scientific problems. Particularly, the application of ML to astronomy has shown to be very useful for classification, clustering, and data cleaning.
Efficient fair PCA for fair representation learning
Kleindessner, Matthรคus, Donini, Michele, Russell, Chris, Zafar, Muhammad Bilal
We revisit the problem of fair principal component analysis (PCA), where the goal is to learn the best low-rank linear approximation of the data that obfuscates demographic information. We propose a conceptually simple approach that allows for an analytic solution similar to standard PCA and can be kernelized. Our methods have the same complexity as standard PCA, or kernel PCA, and run much faster than existing methods for fair PCA based on semidefinite programming or manifold optimization, while achieving similar results.
Automatic Classification of Symmetry of Hemithoraces in Canine and Feline Radiographs
Tahghighi, Peyman, Norena, Nicole, Ukwatta, Eran, Appleby, Ryan B, Komeili, Amin
Purpose: Thoracic radiographs are commonly used to evaluate patients with confirmed or suspected thoracic pathology. Proper patient positioning is more challenging in canine and feline radiography than in humans due to less patient cooperation and body shape variation. Improper patient positioning during radiograph acquisition has the potential to lead to a misdiagnosis. Asymmetrical hemithoraces are one of the indications of obliquity for which we propose an automatic classification method. Approach: We propose a hemithoraces segmentation method based on Convolutional Neural Networks (CNNs) and active contours. We utilized the U-Net model to segment the ribs and spine and then utilized active contours to find left and right hemithoraces. We then extracted features from the left and right hemithoraces to train an ensemble classifier which includes Support Vector Machine, Gradient Boosting and Multi-Layer Perceptron. Five-fold cross-validation was used, thorax segmentation was evaluated by Intersection over Union (IoU), and symmetry classification was evaluated using Precision, Recall, Area under Curve and F1 score. Results: Classification of symmetry for 900 radiographs reported an F1 score of 82.8% . To test the robustness of the proposed thorax segmentation method to underexposure and overexposure, we synthetically corrupted properly exposed radiographs and evaluated results using IoU. The results showed that the models IoU for underexposure and overexposure dropped by 2.1% and 1.2%, respectively. Conclusions: Our results indicate that the proposed thorax segmentation method is robust to poor exposure radiographs. The proposed thorax segmentation method can be applied to human radiography with minimal changes.
Top 5 Python Machine Learning Codes
Python is one of the most popular programming languages for machine learning tasks. It is easy to learn, has a vast number of libraries and frameworks, and is versatile enough to be used in a wide range of applications. In this article, we will discuss some examples of Python code for machine learning tasks, with the aim of providing readers with a better understanding of how Python can be used in these tasks. Before we dive into the examples, it is essential to understand what machine learning is. In simple terms, machine learning is a type of artificial intelligence that enables machines to learn from data and improve over time.
Explorative analysis of human disease-symptoms relations using the Convolutional Neural Network
Dashdorj, Zolzaya, Grigorev, Stanislav, Dovdondash, Munguntsatsral
In the field of health-care and bio-medical research, understanding the relationship between the symptoms of diseases is crucial for early diagnosis and determining hidden relationships between diseases. The study aimed to understand the extent of symptom types in disease prediction tasks. In this research, we analyze a pre-generated symptom-based human disease dataset and demonstrate the degree of predictability for each disease based on the Convolutional Neural Network and the Support Vector Machine. Ambiguity of disease is studied using the K-Means and the Principal Component Analysis. Our results indicate that machine learning can potentially diagnose diseases with the 98-100% accuracy in the early stage, taking the characteristics of symptoms into account. Our result highlights that types of unusual symptoms are a good proxy for disease early identification accurately. We also highlight that unusual symptoms increase the accuracy of the disease prediction task.
Reentry Risk and Safety Assessment of Spacecraft Debris Based on Machine Learning
Gao, Hu, Li, Zhihui, Dang, Depeng, Yang, Jingfan, Wang, Ning
Uncontrolled spacecraft will disintegrate and generate a large amount of debris in the reentry process, and ablative debris may cause potential risks to the safety of human life and property on the ground. Therefore, predicting the landing points of spacecraft debris and forecasting the degree of risk of debris to human life and property is very important. In view that it is difficult to predict the process of reentry process and the reentry point in advance, and the debris generated from reentry disintegration may cause ground damage for the uncontrolled space vehicle on expiration of service. In this paper, we adopt the object-oriented approach to consider the spacecraft and its disintegrated components as consisting of simple basic geometric models, and introduce three machine learning models: the support vector regression (SVR), decision tree regression (DTR) and multilayer perceptron (MLP) to predict the velocity, longitude and latitude of spacecraft debris landing points for the first time. Then, we compare the prediction accuracy of the three models. Furthermore, we define the reentry risk and the degree of danger, and we calculate the risk level for each spacecraft debris and make warnings accordingly. The experimental results show that the proposed method can obtain high accuracy prediction results in at least 15 seconds and make safety level warning more real-time.
Detecting Lexical Borrowings from Dominant Languages in Multilingual Wordlists
Miller, John E., List, Johann-Mattis
Language contact is a pervasive phenomenon reflected in the borrowing of words from donor to recipient languages. Most computational approaches to borrowing detection treat all languages under study as equally important, even though dominant languages have a stronger impact on heritage languages than vice versa. We test new methods for lexical borrowing detection in contact situations where dominant languages play an important role, applying two classical sequence comparison methods and one machine learning method to a sample of seven Latin American languages which have all borrowed extensively from Spanish. All methods perform well, with the supervised machine learning system outperforming the classical systems. A review of detection errors shows that borrowing detection could be substantially improved by taking into account donor words with divergent meanings from recipient words.
Classification with Trust: A Supervised Approach based on Sequential Ellipsoidal Partitioning
Niranjan, Ranjani, Rao, Sachit
Standard metrics of performance of classifiers, such as accuracy and sensitivity, do not reveal the trust or confidence in the predicted labels of data. While other metrics such as the computed probability of a label or the signed distance from a hyperplane can act as a trust measure, these are subjected to heuristic thresholds. This paper presents a convex optimization-based supervised classifier that sequentially partitions a dataset into several ellipsoids, where each ellipsoid contains nearly all points of the same label. By stating classification rules based on this partitioning, Bayes' formula is then applied to calculate a trust score to a label assigned to a test datapoint determined from these rules. The proposed Sequential Ellipsoidal Partitioning Classifier (SEP-C) exposes dataset irregularities, such as degree of overlap, without requiring a separate exploratory data analysis. The rules of classification, which are free of hyperparameters, are also not affected by class-imbalance, the underlying data distribution, or number of features. SEP-C does not require the use of non-linear kernels when the dataset is not linearly separable. The performance, and comparison with other methods, of SEP-C is demonstrated on the XOR-problem, circle dataset, and other open-source datasets.
A kernel-based quantum random forest for improved classification
Srikumar, Maiyuren, Hill, Charles D., Hollenberg, Lloyd C. L.
The emergence of Quantum Machine Learning (QML) to enhance traditional classical learning methods has seen various limitations to its realisation. There is therefore an imperative to develop quantum models with unique model hypotheses to attain expressional and computational advantage. In this work we extend the linear quantum support vector machine (QSVM) with kernel function computed through quantum kernel estimation (QKE), to form a decision tree classifier constructed from a decision directed acyclic graph of QSVM nodes - the ensemble of which we term the quantum random forest (QRF). To limit overfitting, we further extend the model to employ a low-rank Nystr\"{o}m approximation to the kernel matrix. We provide generalisation error bounds on the model and theoretical guarantees to limit errors due to finite sampling on the Nystr\"{o}m-QKE strategy. In doing so, we show that we can achieve lower sampling complexity when compared to QKE. We numerically illustrate the effect of varying model hyperparameters and finally demonstrate that the QRF is able obtain superior performance over QSVMs, while also requiring fewer kernel estimations.
Efficiently Forgetting What You Have Learned in Graph Representation Learning via Projection
Cong, Weilin, Mahdavi, Mehrdad
As privacy protection receives much attention, unlearning the effect of a specific node from a pre-trained graph learning model has become equally important. However, due to the node dependency in the graph-structured data, representation unlearning in Graph Neural Networks (GNNs) is challenging and less well explored. In this paper, we fill in this gap by first studying the unlearning problem in linear-GNNs, and then introducing its extension to non-linear structures. Given a set of nodes to unlearn, we propose PROJECTOR that unlearns by projecting the weight parameters of the pre-trained model onto a subspace that is irrelevant to features of the nodes to be forgotten. PROJECTOR could overcome the challenges caused by node dependency and enjoys a perfect data removal, i.e., the unlearned model parameters do not contain any information about the unlearned node features which is guaranteed by algorithmic construction. Empirical results on real-world datasets illustrate the effectiveness and efficiency of PROJECTOR.