Collaborating Authors


Interview resources : ML/Data Science/AI Research Engineer


Interviewing is a grueling process, specially during COVID. I recently interviewed with Microsoft (Data Scientist ll), Amazon (Applied AI Scientist) and Apple (Software Development : Machine…

The Application of Machine Learning Techniques for Predicting Match Results in Team Sport: A Review

Journal of Artificial Intelligence Research

Predicting the results of matches in sport is a challenging and interesting task. In this paper, we review a selection of studies from 1996 to 2019 that used machine learning for predicting match results in team sport. Considering both invasion sports and striking/fielding sports, we discuss commonly applied machine learning algorithms, as well as common approaches related to data and evaluation. Our study considers accuracies that have been achieved across different sports, and explores whether evidence exists to support the notion that outcomes of some sports may be inherently more difficult to predict. We also uncover common themes of future research directions and propose recommendations for future researchers. Although there remains a lack of benchmark datasets (apart from in soccer), and the differences between sports, datasets and features makes between-study comparisons difficult, as we discuss, it is possible to evaluate accuracy performance in other ways. Artificial Neural Networks were commonly applied in early studies, however, our findings suggest that a range of models should instead be compared. Selecting and engineering an appropriate feature set appears to be more important than having a large number of instances. For feature selection, we see potential for greater inter-disciplinary collaboration between sport performance analysis, a sub-discipline of sport science, and machine learning.

A Hybrid Feature Extraction Method for Nepali COVID-19-Related Tweets Classification


COVID-19 is one of the deadliest viruses, which has killed millions of people around the world to this date. The reason for peoples' death is not only linked to its infection but also to peoples' mental states and sentiments triggered by the fear of the virus. People's sentiments, which are predominantly available in the form of posts/tweets on social media, can be interpreted using two kinds of information: syntactical and semantic. Herein, we propose to analyze peoples' sentiment using both kinds of information (syntactical and semantic) on the COVID-19-related twitter dataset available in the Nepali language. For this, we, first, use two widely used text representation methods: TF-IDF and FastText and then combine them to achieve the hybrid features to capture the highly discriminating features. Second, we implement nine widely used machine learning classifiers (Logistic Regression, Support Vector Machine, Naive Bayes, K-Nearest Neighbor, Decision Trees, Random Forest, Extreme Tree classifier, AdaBoost, and Multilayer Perceptron), based on the three feature representation methods: TF-IDF, FastText, and Hybrid. To evaluate our methods, we use a publicly available Nepali-COVID-19 tweets dataset, NepCov19Tweets, which consists of Nepali tweets categorized into three classes (Positive, Negative, and Neutral). The evaluation results on the NepCOV19Tweets show that the hybrid feature extraction method not only outperforms the other two individual feature extraction methods while using nine different machine learning algorithms but also provides excellent performance when compared with the state-of-the-art methods. Natural language processing (NLP) techniques have been developed to assess peoples' sentiments on various topics.

Application of Machine Learning Algorithms to Predict AKI


Qiuchong Chen,1,* Yixue Zhang,1,* Mengjun Zhang,1 Ziying Li,1 Jindong Liu1,2 1Department of Anesthesiology, The Affiliated Hospital of Xuzhou Medical University, Xuzhou, Jiangsu, People's Republic of China; 2Jiangsu Province Key Laboratory of Anesthesiology, Xuzhou Medical University, Xuzhou, Jiangsu, People's Republic of China *These authors contributed equally to this work Correspondence: Jindong Liu, Department of Anesthesiology, The Affiliated Hospital of Xuzhou Medical University, 99 Huaihai Road West, Quanshan District, Xuzhou, Jiangsu, 221000, People's Republic of China, Email [email protected] Objective: There has been a worldwide increment in acute kidney injury (AKI) incidence among elderly orthopedic operative patients. The AKI prediction model provides patients' early detection a possibility at risk of AKI; most of the AKI prediction models derive, however, from the cardiothoracic operation. The purpose of this study is to predict the risk of AKI in elderly patients after orthopedic surgery based on machine learning algorithm models. Methods: We organized a retrospective study being comprised of 1000 patients with postoperative AKI undergoing orthopedic surgery from September 2016, to June, 2021. They were divided into training (80%;n 799) and test (20%;n 201) sets.We utilized nine machine learning (ML) algorithms and used intraoperative information and preoperative clinical features to acquire models to predict AKI. The performance of the model was evaluated according to the area under the receiver operating characteristic (AUC), sensitivity, specificity and accuracy. Select the optimal model and establish the nomogram to make the prediction model visualization. The concordance statistic (C-statistic) and calibration curve were used to discriminate and calibrate the nomogram respectively. Results: In predicting AKI, nine ML algorithms posted AUC of 0.656– 1.000 in the training cohort, with the randomforest standing out and AUC of 0.674– 0.821 in the test cohort, with the logistic regression model standing out.

Accuracy versus interpretability? With generalized additive models (GAMs), you can have both


In this post, I will provide an overview of generalized additive models (GAMs) and their desirable features. Predictive accuracy has long been an important goal of machine learning. But model interpretability has received more attention in recent years. Stakeholders, such as executives, regulators, and domain experts, often want to understand how and why a model makes its predictions before they trust it enough to use it in practice. However, when you train a machine learning model, you typically face a tradeoff between accuracy and interpretability.

Machine Learning Classification Bootcamp in Python


Apply advanced machine learning models to perform sentiment analysis and classify customer reviews such as Amazon Alexa products reviews Understand the theory and intuition behind several machine learning algorithms such as K-Nearest Neighbors, Support Vector Machines (SVM), Decision Trees, Random Forest, Naive Bayes, and Logistic Regression Implement classification algorithms in Scikit-Learn for K-Nearest Neighbors, Support Vector Machines (SVM), Decision Trees, Random Forest, Naive Bayes, and Logistic Regression Build an e-mail spam classifier using Naive Bayes classification Technique Apply machine learning models to Healthcare applications such as Cancer and Kyphosis diseases classification Develop Models to predict customer behavior towards targeted Facebook Ads Classify data using K-Nearest Neighbors, Support Vector Machines (SVM), Decision Trees, Random Forest, Naive Bayes, and Logistic Regression Build an in-store feature to predict customer's size using their features Develop a fraud detection classifier using Machine Learning Techniques Master Python Seaborn library for statistical plots Understand the difference between Machine Learning, Deep Learning and Artificial Intelligence Perform feature engineering and clean your training and testing data to remove outliers Master Python and Scikit-Learn for Data Science and Machine Learning Learn to use Python Matplotlib library for data Plotting Build an in-store feature to predict customer's size using their features Are you ready to master Machine Learning techniques and Kick-off your career as a Data Scientist?! You came to the right place! Machine Learning skill is one of the top skills to acquire in 2019 with an average salary of over $114,000 in the United States according to PayScale! The total number of ML jobs over the past two years has grown around 600 percent and expected to grow even more by 2020. In this course, we are going to provide students with knowledge of key aspects of state-of-the-art classification techniques.

The harm of class imbalance corrections for risk prediction models: illustration and simulation using logistic regression


Methods to correct class imbalance, i.e. imbalance between the frequency of outcome events and non-events, are receiving increasing interest for developing prediction models. We examined the effect of imbalance correction on the performance of standard and penalized (ridge) logistic regression models in terms of discrimination, calibration, and classification. We examined random undersampling, random oversampling and SMOTE using Monte Carlo simulations and a case study on ovarian cancer diagnosis. The results indicated that all imbalance correction methods led to poor calibration (strong overestimation of the probability to belong to the minority class), but not to better discrimination in terms of the area under the receiver operating characteristic curve. Imbalance correction improved classification in terms of sensitivity and specificity, but similar results were obtained by shifting the probability threshold instead. Our study shows that outcome imbalance is not a problem in itself, and that imbalance correction may even worsen model performance.

Inference and FDR Control for Simulated Ising Models in High-dimension Machine Learning

The (probabilistic) graphical model consists of a collection of probability distributions that factorize according to the structure of an underlying graph [52]. The graphical model captures the complex dependencies among random variables and build large-scale multivariate statistical models, which has been used in many research areas such as hierarchical Bayesian models [27], contingency table analysis [20, 53] in categorical data analysis [1, 23, 37], constraint satisfaction [16, 15], language and speech processing [11, 31], image processing [17, 24, 28] and spatial statistics more generally [8]. In our work, we focus on the undirected graphical models, where the probability distribution factorizes according to the function defined on the cliques of the graph. The undirected graphical models have a variety of applications, including statistical physics [32], natural language processing [38], image analysis [54] and spatial statistics [43]. Specifically, we pay attention to the undirected graphical models which can be described as exponential families, a broad class of probability distributions elaborately studied in many statistical literature [4, 21, 13]. The properties of the exponential families provide some connections between the inference methods and the convex analysis [12, 29]. There are many well-known examples that are undirected graphical models viewed as exponential families, such as Ising model [32, 5], Gaussian MRF [46] and latent Dirichlet allocation [11].

Inference of Multiscale Gaussian Graphical Model Machine Learning

Gaussian Graphical Models (GGMs) are widely used for exploratory data analysis in various fields such as genomics, ecology, psychometry. In a high-dimensional setting, when the number of variables exceeds the number of observations by several orders of magnitude, the estimation of GGM is a difficult and unstable optimization problem. Clustering of variables or variable selection is often performed prior to GGM estimation. We propose a new method allowing to simultaneously infer a hierarchical clustering structure and the graphs describing the structure of independence at each level of the hierarchy. This method is based on solving a convex optimization problem combining a graphical lasso penalty with a fused type lasso penalty. Results on real and synthetic data are presented.

Uncalibrated Models Can Improve Human-AI Collaboration Artificial Intelligence

In safety-critical settings like medicine, AI is often integrated in the form of interactive feedback with a human, allowing the human to decide when and to what extent the AI's "advice" is utilized Vodrahalli et al. [2020]. For example, in medical diagnosis tasks, this essentially places the AI algorithm in a similar category as lab tests or other exams a doctor may order to aid in diagnosis. This form of implementation is important to partially mitigate the often black box nature of AI algorithms that limit a user's trust and usage of AI Feldman et al. [2019], Ribeiro et al. [2016], Xie et al. [2020], Miller [2019]. Typically, AI algorithms are designed and optimized independently of the human users - the AI is designed to be as accurate as possible for its given task using the standard training objectives. This makes sense if the model is used to make isolated decisions and predictions by itself. In this paper, we question this premise and ask whether a joint optimization of the entire human-AI system is possible. In particular, we revisit the conventional wisdom that models with calibrated confidence are desired for collaborative systems, as they better allow accurate transfer of prediction uncertainty between models and/or humans Guo et al. [2017]. We investigate whether explicitly making the AI advice uncalibrated (i.e.