Goto

Collaborating Authors

Statistical Learning


9 Free Harvard Courses to Learn Data Science in 2022 - KDnuggets

#artificialintelligence

Last month, I wrote an article on building a data science learning roadmap with free courses offered by MIT. However, the focus of most courses I listed was highly theoretical, and there was a lot of emphasis on learning the math and statistics behind machine learning algorithms. While the MIT roadmap will help you understand the principles behind predictive modelling, what's lacking is the ability to actually implement the concepts learnt and execute a real-world data science project. After spending some time scouring the Internet, I found a couple of freely available courses by Harvard that covered the entire data science workflow?--?from programming to data analysis, statistics, and machine learning. Once you complete all the courses in this learning path, you are also given a capstone project that allows you to put everything you learnt in practice.


Weak Convergence of Approximate reflection coupling and its Application to Non-convex Optimization

#artificialintelligence

In this paper, we propose a weak approximation of the reflection coupling (RC) for stochastic differential equations (SDEs), and prove it converges weakly to the desired coupling. In contrast to the RC, the proposed approximate reflection coupling (ARC) need not take the hitting time of processes to the diagonal set into consideration and can be defined as the solution of some SDEs on the whole time interval. Therefore, ARC can work effectively against SDEs with different drift terms. As an application of ARC, an evaluation on the effectiveness of the stochastic gradient descent in a non-convex setting is also described. For the sample size n, the step size η, and the batch size B, we derive uniform evaluations on the time with orders n -1, η 1/2, and ((n - B) / B (n - 1)), respectively.


Scaling assistive healthcare technology with 5G

#artificialintelligence

With recent advances in communication networks and machine learning (ML), healthcare is one of the key application domains which stands to benefit from many opportunities, including remote global healthcare, hospital services on cloud, remote diagnosis or surgeries, among others. One of those advances is network slicing, making it possible to provide high-bandwidth, low-latency and personalized healthcare services for individual users. This is important for patients using healthcare monitoring devices that capture various biological signals (biosignals) such as from the heart (ECG), muscles (EMG), brain (EEG), or activities from other parts of the body. In this blog, we discuss the challenges to building a scalable delivery platform for such connected healthcare services, and how technological advances can help to transform this landscape significantly for the benefit of both users and healthcare service providers. Our specific focus is on assistive technology devices which are increasingly being used by many individuals.


15 Most Common Data Science Interview Questions

#artificialintelligence

Some interviewers ask hard questions while others ask relatively easy questions. As an interviewee, it is your choice to go prepared. And when it comes to a domain like Machine Learning, preparations might fall short. You have to be prepared for everything. While preparing, you might have stuck at a point where you wonder what more shall I read. Well, based on almost 15-17 data science interviews that I have attended, here I have put 15, very commonly asked, as well as important Data Science and Machine Learning related questions that were asked to me in almost all of them and I recommend you must study these thoroughly.


Adding Explainability to Clustering - Analytics Vidhya

#artificialintelligence

Clustering is an unsupervised algorithm that is used for determining the intrinsic groups present in unlabelled data. For instance, a B2C business might be interested in finding segments in its customer base. Clustering is hence used commonly for different use-cases like customer segmentation, market segmentation, pattern recognition, search result clustering etc. Some standard clustering techniques are K-means, DBSCAN, Hierarchical clustering amongst other methods. Clusters created using techniques like Kmeans are often not easy to decipher because it is difficult to determine why a particular row of data is classified in a particular bucket.


How to Explore a Dataset of Images with Graph Theory

#artificialintelligence

When you start working on a dataset that consists of pictures, you'll probably be asked such questions as: can you check if the pictures are good? A quick-and-dirty solution would be to manually look at the data one by one and try to sort them out, but that might be tedious work depending on how many pictures you get. For example, in manufacturing, you could get a sample with thousands of pictures from a production line consisting of batteries of different types and sizes. You'll have to manually go through all pictures and arrange them by type, size, or even color. The other and more efficient option, on the other hand, would be to go the computer vision route and find an algorithm that can automatically arrange and sort your images -- this is the goal of this article. But how can we automate what a person does, i.e. compare pictures two by two with one another and sort them based on similarities?


Day 15–60 days of Data Science and Machine Learning

#artificialintelligence

Hope you all had a great Halloween weekend [ I dressed up as "Mother of Dragons" along with my cool " Game of thrones" techie friends];) #winteriscoming. Let's get back and learn some more data science and machine learning. I hope you all have already grasped the Python essentials, Statistics and Maths from day 1 -- day 8(links shared below), Pandas part 1 and part 2 on Day 9, Day 10, Numpy as Day 11, Data Preprocessing Part 1 as Day 12, Data Preprocessing part 2 as Day 13th, Hands on Regression Part 1 as Day 14th. In this post we will cover how we can implement Regression -- part 2 as Day 15. The Linear Regression method is basically a linear approach for modeling the relationship between a scalar dependent variable y and one or more explanatory variables (or independent variables) as it just minimizes the least squares error: for one object target y x T * w, where w is model's weights.


Deep Studying with Label Differential Privateness - Channel969

#artificialintelligence

Over the past a number of years, there was an elevated give attention to growing differential privateness (DP) machine studying (ML) algorithms. DP has been the idea of a number of sensible deployments in business -- and has even been employed by the U.S. Census -- as a result of it allows the understanding of system and algorithm privateness ensures. The underlying assumption of DP is that altering a single person's contribution to an algorithm mustn't considerably change its output distribution. In the usual supervised studying setting, a mannequin is educated to make a prediction of the label for every enter given a coaching set of instance pairs {[input1,label1], …, [inputn, labeln]}. Within the case of deep studying, earlier work launched a DP coaching framework, DP-SGD, that was built-in into TensorFlow and PyTorch.


Development and internal validation of a machine-learning-developed model for predicting 1-year mortality after fragility hip fracture - BMC Geriatrics

#artificialintelligence

Fragility hip fracture increases morbidity and mortality in older adult patients, especially within the first year. Identification of patients at high risk of death facilitates modification of associated perioperative factors that can reduce mortality. Various machine learning algorithms have been developed and are widely used in healthcare research, particularly for mortality prediction. This study aimed to develop and internally validate 7 machine learning models to predict 1-year mortality after fragility hip fracture. This retrospective study included patients with fragility hip fractures from a single center (Siriraj Hospital, Bangkok, Thailand) from July 2016 to October 2018. A total of 492 patients were enrolled. They were randomly categorized into a training group (344 cases, 70%) or a testing group (148 cases, 30%). Various machine learning techniques were used: the Gradient Boosting Classifier (GB), Random Forests Classifier (RF), Artificial Neural Network Classifier (ANN), Logistic Regression Classifier (LR), Naive Bayes Classifier (NB), Support Vector Machine Classifier (SVM), and K-Nearest Neighbors Classifier (KNN). All models were internally validated by evaluating their performance and the area under a receiver operating characteristic curve (AUC). For the testing dataset, the accuracies were GB model = 0.93, RF model = 0.95, ANN model = 0.94, LR model = 0.91, NB model = 0.89, SVM model = 0.90, and KNN model = 0.90. All models achieved high AUCs that ranged between 0.81 and 0.99. The RF model also provided a negative predictive value of 0.96, a positive predictive value of 0.93, a specificity of 0.99, and a sensitivity of 0.68. Our machine learning approach facilitated the successful development of an accurate model to predict 1-year mortality after fragility hip fracture. Several machine learning algorithms (eg, Gradient Boosting and Random Forest) had the potential to provide high predictive performance based on the clinical parameters of each patient. The web application is available at www.hipprediction.com . External validation in a larger group of patients or in different hospital settings is warranted to evaluate the clinical utility of this tool. Thai Clinical Trials Registry (22 February 2021; reg. no. TCTR20210222003 ).


Know About Ensemble Methods in Machine Learning - Analytics Vidhya

#artificialintelligence

This article was published as a part of the Data Science Blogathon. The variance is the difference between the model and the ground truth value, whereas the error is the outcome of sensitivity to tiny perturbations in the training set. Excessive bias might cause an algorithm to miss unique relationships between the intended outputs and the features (underfitting). There is a high variance in the algorithm that models random noise in the training data (overfitting). The bias-variance tradeoff is a characteristic of a model that states to lower the bias in estimated parameters, the variance of the parameter estimated across samples has increased.