Collaborating Authors


Identifying tumor cells at the single-cell level using machine learning - Genome Biology


Cancer is a disease that stems from the disruption of cellular state. Through genetic perturbations, tumor cells attain cellular states that give them proliferative advantage over the surrounding normal tissue [1]. The inherent variability of this process has hampered efforts to find highly effective common therapies, thereby ushering the need for precision medicine [2]. The scale of single-cell experiments is poised to revolutionize personalized medicine by effective characterization of the complete heterogeneity within a tumor for each individual patient [3, 4]. Recent expansion of single-cell sequencing technologies has exponentially increased the scale of knowledge attainable through a single biological experiment [5].

Cyber Criminals vs Robots


What happens when cyber criminals face robots? What happens when they use robots? How will offensive and defensive strategies of cybersecurity evolve as artificial intelligence continues to grow? Both artificial intelligence and cybersecurity have consistently landed in the top charts of fastest growing industries year after year¹². The 2 fields overlap in many areas and will undoubtedly continue to do so for years to come. For this article, I have narrowed my scope to a specific use case, intrusion detection. An Intrusion Detection System (IDS) is software that monitors a company's network for malicious activity. I dive into AI's role in Intrusion Detection Systems, code my own IDS using machine learning, and further demonstrate how it can be used to assist threat hunters.

What algorithm curate machine learning


In order to address a specific problem, practitioners must select an acceptable learning algorithm. A general rule of thumb is that for classification issues, we should use algorithms with high accuracy, whereas for regression problems, we should choose algorithms with lower accuracy but higher robustness because the absolute error rate is unimportant. Here are a few examples: Linear Regression: Linear regression uses the linearity principle to predict continuous values from a set of input variables. It achieves this by minimizing the total of squared errors. This method is fast and scalable for huge data sets since it avoids iterating over all possible replies; nonetheless, it is unstable.

Artificial intelligence in cardiovascular medicine


Artificial intelligence (AI) is a rapidly evolving transdisciplinary field employing machine learning (ML) techniques, which aim to simulate human intuition to offer cost-effective and scalable solutions to better manage CVD. ML algorithms are increasingly being developed and applied in various facets of cardiovascular medicine, including and not limited to heart failure, electrophysiology, valvular heart disease and coronary artery disease. Within heart failure, AI algorithms can augment diagnostic capabilities and clinical decision-making through automated cardiac measurements. Occult cardiac disease is increasingly being identified using ML from diagnostic data. Improved diagnostic and prognostic capabilities using ML algorithms are enhancing clinical care of patients with valvular heart disease and coronary artery disease. The growth of AI techniques is not without inherent challenges, most important of which is the need for greater external validation through multicenter, prospective clinical trials.

K-Nearest Neighbors, Naive Bayes, and Decision Tree in 10 Minutes


Unlike linear models and SVM (see Part 1), some machine learning models are really complex to learn from their mathematical formulation. Fortunately, they can be understood by following a step-by-step process they execute on a small dummy dataset. This way, you can uncover machine learning models under the hood without the "math bottleneck". You will learn three more models in this story after Part 1: K-Nearest Neighbors (KNN), Naive Bayes, and Decision Tree. KNN is a non-generalizing machine learning model since it simply "remembers" all of its train data.

How to Build an Online Machine Learning App With Python


Machine learning is rapidly becoming as ubiquitous as data itself. Quite literally wherever there is an abundance of data, machine learning is somehow intertwined. After all, what utility would data have if we were not able to use it to predict something about the future? Luckily there is a plethora of toolkits and frameworks that have made it rather simple to deploy ML in Python. Specifically, Sklearn has done a terrifically effective job at making ML accessible to developers.

Impact of Imputation Strategies on Fairness in Machine Learning

Journal of Artificial Intelligence Research

Research on Fairness and Bias Mitigation in Machine Learning often uses a set of reference datasets for the design and evaluation of novel approaches or definitions. While these datasets are well structured and useful for the comparison of various approaches, they do not reflect that datasets commonly used in real-world applications can have missing values. When such missing values are encountered, the use of imputation strategies is commonplace. However, as imputation strategies potentially alter the distribution of data they can also affect the performance, and potentially the fairness, of the resulting predictions, a topic not yet well understood in the fairness literature. In this article, we investigate the impact of different imputation strategies on classical performance and fairness in classification settings. We find that the selected imputation strategy, along with other factors including the type of classification algorithm, can significantly affect performance and fairness outcomes. The results of our experiments indicate that the choice of imputation strategy is an important factor when considering fairness in Machine Learning. We also provide some insights and guidance for researchers to help navigate imputation approaches for fairness.

How to Evaluate Survival Analysis Models


Survival analysis encompasses a collection of statistical methods for describing time to event data. It originates from clinical studies, where physicians are mostly interested in assessing the effect of a new therapy on survival against a control group, or how certain features represent a risk of an adverse event in time. This post introduces the challenges related to survival analysis (censoring) and explains popular metrics to evaluate survival models, sharing practical Python examples along the way. Let us imagine to be clinical researchers. As we want to assess that the new treatment has a significant effect in preventing an adverse event (such as death), we monitor the patients of both groups for a certain period of time. This condition goes under the name of right censoring, and it is a common trait of survival analysis studies.

Giuliano Liguori on LinkedIn: #BigData #Analytics #DataScience


The variable you want to predict is called the dependent variable. The variable you are using to predict the other variable's value is called the independent variable. K-NN is a non-parametric algorithm, which means it does not make any assumption on underlying data. It is also called a lazy learner algorithm because it does not learn from the training set immediately instead it stores the dataset and at the time of classification, it performs an action on the dataset. The Naive Bayes classification algorithm is a probabilistic classifier.

Python for Machine Learning: A Tutorial


Python has become the most popular data science and machine learning programming language. But in order to obtain effective data and results, it's important that you have a basic understanding of how it works with machine learning. In this introductory tutorial, you'll learn the basics of Python for machine learning, including different model types and the steps to take to ensure you obtain quality data, using a sample machine learning problem. In addition, you'll get to know some of the most popular libraries and tools for machine learning. Machine learning (ML) is a form of artificial intelligence (AI) that teaches computers to make predictions and recommendations and solve problems based on data. Its problem-solving capabilities make it a useful tool in industries such as financial services, healthcare, marketing and sales, and education among others. There are three main types of machine learning: supervised, unsupervised, and reinforcement. In supervised learning, the computer is given a set of training data that includes both the input data (what we want to predict) and the output data (the prediction).