Collaborating Authors

Support Vector Machines

A fusion-based machine learning approach for the prediction of the onset of diabetes - Strathprints


A growing portfolio of research has been reported on the use of machine learning-based architectures and models in the domain of healthcare. The development of data-driven applications and services for the diagnosis and classification of key illness conditions is challenging owing to issues of low volume, low-quality contextual data for the training, and validation of algorithms, which, in turn, compromises the accuracy of the resultant models. Here, a fusion machine learning approach is presented reporting an improvement in the accuracy of the identification of diabetes and the prediction of the onset of critical events for patients with diabetes (PwD). Globally, the cost of treating diabetes, a prevalent chronic illness condition characterized by high levels of sugar in the bloodstream over long periods, is placing severe demands on health providers and the proposed solution has the potential to support an increase in the rates of survival of PwD through informing on the optimum treatment on an individual patient basis. At the core of the proposed architecture is a fusion of machine learning classifiers (Support Vector Machine and Artificial Neural Network).

Support Vector Machine(SVM): A Complete guide for beginners


SVM is a powerful supervised algorithm that works best on smaller datasets but on complex ones. Support Vector Machine, abbreviated as SVM can be used for both regression and classification tasks, but generally, they work best in classification problems. They were very famous around the time they were created, during the 1990s, and keep on being the go-to method for a high-performing algorithm with a little tuning. By now, I hope you've now mastered Decision Trees, Random Forest, Naïve Bayes, K-nearest neighbor, and Ensemble Modelling techniques. If not, I would suggest you take out a few minutes and read about them as well. In this article, I will explain to you What is SVM, how SVM works, and the math intuition behind this crucial ML algorithm. It is a supervised machine learning problem where we try to find a hyperplane that best separates the two classes. Note: Don't get confused between SVM and logistic regression.

Prediction of Concrete Compressive Strength According to Components with Machine Learning


Concrete is the most commonly used material in civil engineering. That is why lots of research and experiments are done on concrete. In this experiment, it is tried to understand how the compressive strength will be according to the materials in the concrete. Concrete has many properties like shear strength, tensile strength. Compressive strength is one of the most important properties.

AI and Optogenetics Disrupt the Neuroscience of Dopamine


Innovative technologies such as artificial intelligence (AI) machine learning and optogenetics are accelerating discoveries in life sciences, especially in the field of neuroscience. A new breakthrough study published in Current Biology by pioneering brain researchers at Vanderbilt University used optogenetics and AI machine learning to reveal that dopamine is not just a "pleasure molecule" -- a revolutionary finding that may impact how addiction and psychiatric diseases are treated in the future. "Dopamine deficits are seen in patients suffering from substance use disorder," said Erin Calipari, an assistant professor of pharmacology at Vanderbilt University, and faculty member of both the Vanderbilt Brain Institute and the Vanderbilt Center for Addiction Research. "These individuals have reduced dopamine as well as deficits in decision-making that would be explained by our data and new model. These deficits in decision-making are highly correlated with the severity of addiction as well as predicting treatment outcomes. These data are really key to understanding the relationship between dopamine this disease and figuring out how to treat it."

Machine Learning on R 2021


There are people who are eager to move to Analytics careers but do not have the requisite skill sets. As we move into our 12th year in the Analytics Industry, OrangeTree Global has designed specific courses for freshers and working professionals who are looking at moving to Data Science, Machine Learning and Big Data Careers. Since 2009, OrangeTree Global has embarked on an ambitious vision of providing affordable and effective Analytics Training and Education across the country. OrangeTree Global has over a decade's experience in upskilling professionals and helping them move to analytics jobs and careers within and outside India. If you are reading this, we hope to be a part of your journey too.The program builds a solid foundation by covering the most popular and widely used machine learning technologies and its applications, including Naive Bayes theory and application, K Nearest Neighbors (KNN) theory and application, Random forest theory and application, Gradient Boosting Theory and Application and also Support Vector Machine Theory and Application–laying the building blocks for truly expanded analytical abilities.

Integrating Unsupervised Clustering and Label-specific Oversampling to Tackle Imbalanced Multi-label Data Artificial Intelligence

There is often a mixture of very frequent labels and very infrequent labels in multi-label datatsets. This variation in label frequency, a type class imbalance, creates a significant challenge for building efficient multi-label classification algorithms. In this paper, we tackle this problem by proposing a minority class oversampling scheme, UCLSO, which integrates Unsupervised Clustering and Label-Specific data Oversampling. Clustering is performed to find out the key distinct and locally connected regions of a multi-label dataset (irrespective of the label information). Next, for each label, we explore the distributions of minority points in the cluster sets. Only the minority points within a cluster are used to generate the synthetic minority points that are used for oversampling. Even though the cluster set is the same across all labels, the distributions of the synthetic minority points will vary across the labels. The training dataset is augmented with the set of label-specific synthetic minority points, and classifiers are trained to predict the relevance of each label independently. Experiments using 12 multi-label datasets and several multi-label algorithms show that the proposed method performed very well compared to the other competing algorithms.

Improved genetic algorithm and XGBoost classifier for power transformer fault diagnosis


Power transformer is an essential component for the stable and reliable operation of electrical power grid. The traditional diagnostic methods based on dissolved gas analysis (DGA) have been used to identify the power transformer faults. However, the application of these methods is limited due to the low accuracy of fault identification. In this paper, a transformer fault diagnosis system is developed based on the combination of an improved genetic algorithm (IGA) and the XGBoost. In the transformer fault diagnosis system, the improved genetic algorithm is employed to pre-select the input features from the DGA data and optimize the XGBoost classifier. Performance measures such as average unfitness value, likelihood of evolution leap, and likelihood of optimality are used to validate the efficacy of the proposed improved genetic algorithm. The results of simulation experiments show that the improved genetic algorithm can get the optimal solution stably and reliably, and the proposed method improves the average accuracy of transformer fault diagnosis to 99.2\%. Compared to IEC ratios, dual triangle, support vector machine (SVM), and common vector approach (CVA), the diagnostic accuracy of the proposed method is improved by 30.2\%, 47.2\%, 11.2\%, and 3.6\%, respectively. The proposed method can be a potential solution to identify the transformer fault types.

Types of Multi Classification


This blog introduces different types of multi classification systems. Multiclass classifiers can distinguish between more than two classes other than binary classifiers. Stochastic gradient descent (SGD) classifiers, Random Forest classifiers, and naive Bayes classifiers etc. are capable of handling multiple classes natively. On the other hand, Logistic Regression or Support Vector Machine classifiers are strictly binary classifiers. There are various strategies that you can use to perform multiclass classification with multiple binary classifiers.

Sharp Analysis of Random Fourier Features in Classification Machine Learning

We study the theoretical properties of random Fourier features classification with Lipschitz continuous loss functions such as support vector machine and logistic regression. Utilizing the regularity condition, we show for the first time that random Fourier features classification can achieve $O(1/\sqrt{n})$ learning rate with only $\Omega(\sqrt{n} \log n)$ features, as opposed to $\Omega(n)$ features suggested by previous results. Our study covers the standard feature sampling method for which we reduce the number of features required, as well as a problem-dependent sampling method which further reduces the number of features while still keeping the optimal generalization property. Moreover, we prove that the random Fourier features classification can obtain a fast $O(1/n)$ learning rate for both sampling schemes under Massart's low noise assumption. Our results demonstrate the potential effectiveness of random Fourier features approximation in reducing the computational complexity (roughly from $O(n^3)$ in time and $O(n^2)$ in space to $O(n^2)$ and $O(n\sqrt{n})$ respectively) without having to trade-off the statistical prediction accuracy. In addition, the achieved trade-off in our analysis is at least the same as the optimal results in the literature under the worst case scenario and significantly improves the optimal results under benign regularity conditions.