Collaborating Authors

Support Vector Machines

[D] Non-automated machine learning?


You can calculate every machine lewrning algorithm also by hand at a very small scale. For example an ANN with one single hidden layer that has to learn the "XOR" function. An other example is the support vector machine for which you can crate nice visualizations of the inital idea. At a large scale everything becomes quite infeasible to do by hand, but we dont have to since we have computers. If you understand small examples like mentioned above, you got the idea, no meed to do it by hand at a large scale.

[R] Please point me in the right direction: decision trees or possibly something better


The Linear Baseline model solves a convex problem and thus will converge to roughly the same optimum. This gives you basically an Idea for how informative naive correlation between features are. Logistic Regressions and Support Vector Machines are a common choice here in my experience. Regarding nonlinear models, the random forrest classifier is neat. XGBoost (Gradient Boosted Decision forrest) from the package of the same name is also realy good.

Predictive Maintenance -- Bridging Artificial Intelligence and IoT Artificial Intelligence

This paper highlights the trends in the field of predictive maintenance with the use of machine learning. With the continuous development of the Fourth Industrial Revolution, through IoT, the technologies that use artificial intelligence are evolving. As a result, industries have been using these technologies to optimize their production. Through scientific research conducted for this paper, conclusions were drawn about the trends in Predictive Maintenance applications with the use of machine learning bridging Artificial Intelligence and IoT. These trends are related to the types of industries in which Predictive Maintenance was applied, the models of artificial intelligence were implemented, mainly of machine learning and the types of sensors that are applied through the IoT to the applications. Six sectors were presented and the production sector was dominant as it accounted for 54.54% of total publications. In terms of artificial intelligence models, the most prevalent among ten were the Artificial Neural Networks, Support Vector Machine and Random Forest with 27.84%, 17.72% and 13.92% respectively. Finally, twelve categories of sensors emerged, of which the most widely used were the sensors of temperature and vibration with percentages of 60.71% and 46.42% correspondingly.

Modeling Weather-induced Home Insurance Risks with Support Vector Machine Regression Machine Learning

Insurance industry is one of the most vulnerable sectors to climate change. Assessment of future number of claims and incurred losses is critical for disaster preparedness and risk management. In this project, we study the effect of precipitation on a joint dynamics of weather-induced home insurance claims and losses. We discuss utility and limitations of such machine learning procedures as Support Vector Machines and Artificial Neural Networks, in forecasting future claim dynamics and evaluating associated uncertainties. We illustrate our approach by application to attribution analysis and forecasting of weather-induced home insurance claims in a middle-sized city in the Canadian Prairies.

Prediction Models for AKI in ICU: A Comparative Study


Purpose: To assess the performance of models for early prediction of acute kidney injury (AKI) in the Intensive Care Unit (ICU) setting. Patients and Methods: Data were collected from the Medical Information Mart for Intensive Care (MIMIC)-III database for all patients aged 18 years who had their serum creatinine (SCr) level measured for 72 h following ICU admission. Those with existing conditions of kidney disease upon ICU admission were excluded from our analyses. Seventeen predictor variables comprising patient demographics and physiological indicators were selected on the basis of the Kidney Disease Improving Global Outcomes (KDIGO) and medical literature. Six models from three types of methods were tested: Logistic Regression (LR), Support Vector Machines (SVM), Random Forest (RF), eXtreme Gradient Boosting (XGBoost), Light Gradient Boosting Decision Machine (LightGBM), and Convolutional Neural Network (CNN).

Wasserstein Robust Support Vector Machines with Fairness Constraints Machine Learning

We propose a distributionally robust support vector machine with a fairness constraint that encourages the classifier to be fair in view of the equality of opportunity criterion. We use a type-$\infty$ Wasserstein ambiguity set centered at the empirical distribution to model distributional uncertainty and derive an exact reformulation for worst-case unfairness measure. We establish that the model is equivalent to a mixed-binary optimization problem, which can be solved by standard off-the-shelf solvers. We further prove that the expectation of the hinge loss objective function constitutes an upper bound on the misclassification probability. Finally, we numerically demonstrate that our proposed approach improves fairness with negligible loss of predictive accuracy.

FedV: Privacy-Preserving Federated Learning over Vertically Partitioned Data Artificial Intelligence

Federated learning (FL) has been proposed to allow collaborative training of machine learning (ML) models among multiple parties where each party can keep its data private. In this paradigm, only model updates, such as model weights or gradients, are shared. Many existing approaches have focused on horizontal FL, where each party has the entire feature set and labels in the training data set. However, many real scenarios follow a vertically-partitioned FL setup, where a complete feature set is formed only when all the datasets from the parties are combined, and the labels are only available to a single party. Privacy-preserving vertical FL is challenging because complete sets of labels and features are not owned by one entity. Existing approaches for vertical FL require multiple peer-to-peer communications among parties, leading to lengthy training times, and are restricted to (approximated) linear models and just two parties. To close this gap, we propose FedV, a framework for secure gradient computation in vertical settings for several widely used ML models such as linear models, logistic regression, and support vector machines. FedV removes the need for peer-to-peer communication among parties by using functional encryption schemes; this allows FedV to achieve faster training times. It also works for larger and changing sets of parties. We empirically demonstrate the applicability for multiple types of ML models and show a reduction of 10%-70% of training time and 80% to 90% in data transfer with respect to the state-of-the-art approaches.

Potential neutralizing antibodies discovered for novel corona virus using machine learning


The fast and untraceable virus mutations take lives of thousands of people before the immune system can produce the inhibitory antibody. The recent outbreak of COVID-19 infected and killed thousands of people in the world. Rapid methods in finding peptides or antibody sequences that can inhibit the viral epitopes of SARS-CoV-2 will save the life of thousands. To predict neutralizing antibodies for SARS-CoV-2 in a high-throughput manner, in this paper, we use different machine learning (ML) model to predict the possible inhibitory synthetic antibodies for SARS-CoV-2. We collected 1933 virus-antibody sequences and their clinical patient neutralization response and trained an ML model to predict the antibody response. Using graph featurization with variety of ML methods, like XGBoost, Random Forest, Multilayered Perceptron, Support Vector Machine and Logistic Regression, we screened thousands of hypothetical antibody sequences and found nine stable antibodies that potentially inhibit SARS-CoV-2. We combined bioinformatics, structural biology, and Molecular Dynamics (MD) simulations to verify the stability of the candidate antibodies that can inhibit SARS-CoV-2.

Probabilistic combination of eigenlungs-based classifiers for COVID-19 diagnosis in chest CT images Machine Learning

The outbreak of the COVID-19 (Coronavirus disease 2019) pandemic has changed the world. According to the World Health Organization (WHO), there have been more than 100 million confirmed cases of COVID-19, including more than 2.4 million deaths. It is extremely important the early detection of the disease, and the use of medical imaging such as chest X-ray (CXR) and chest Computed Tomography (CCT) have proved to be an excellent solution. However, this process requires clinicians to do it within a manual and time-consuming task, which is not ideal when trying to speed up the diagnosis. In this work, we propose an ensemble classifier based on probabilistic Support Vector Machine (SVM) in order to identify pneumonia patterns while providing information about the reliability of the classification. Specifically, each CCT scan is divided into cubic patches and features contained in each one of them are extracted by applying kernel PCA. The use of base classifiers within an ensemble allows our system to identify the pneumonia patterns regardless of their size or location. Decisions of each individual patch are then combined into a global one according to the reliability of each individual classification: the lower the uncertainty, the higher the contribution. Performance is evaluated in a real scenario, yielding an accuracy of 97.86%. The large performance obtained and the simplicity of the system (use of deep learning in CCT images would result in a huge computational cost) evidence the applicability of our proposal in a real-world environment.

Root cause prediction based on bug reports Artificial Intelligence

This paper proposes a supervised machine learning approach for predicting the root cause of a given bug report. Knowing the root cause of a bug can help developers in the debugging process - either directly or indirectly by choosing proper tool support for the debugging task. We mined 54755 closed bug reports from the issue trackers of 103 GitHub projects and applied a set of heuristics to create a benchmark consisting of 10459 reports. A subset was manually classified into three groups (semantic, memory, and concurrency) based on the bugs' root causes. Since the types of root cause are not equally distributed, a combination of keyword search and random selection was applied. Our data set for the machine learning approach consists of 369 bug reports (122 concurrency, 121 memory, and 126 semantic bugs). The bug reports are used as input to a natural language processing algorithm. We evaluated the performance of several classifiers for predicting the root causes for the given bug reports. Linear Support Vector machines achieved the highest mean precision (0.74) and recall (0.72) scores. The created bug data set and classification are publicly available.