Goto

Collaborating Authors

Directed Networks


What is Machine Learning? A Primer for the Epidemiologist

#artificialintelligence

Machine learning is a branch of computer science that has the potential to transform epidemiologic sciences. Amid a growing focus on "Big Data," it offers epidemiologists new tools to tackle problems for which classical methods are not well-suited. In order to critically evaluate the value of integrating machine learning algorithms and existing methods, however, it is essential to address language and technical barriers between the two fields that can make it difficult for epidemiologists to read and assess machine learning studies. Here, we provide an overview of the concepts and terminology used in machine learning literature, which encompasses a diverse set of tools with goals ranging from prediction to classification to clustering. We provide a brief introduction to 5 common machine learning algorithms and 4 ensemble-based approaches. We then summarize epidemiologic applications of machine learning techniques in the published literature. We recommend approaches to incorporate machine learning in epidemiologic research and discuss opportunities and challenges for integrating machine learning and existing epidemiologic research methods. Machine learning is a branch of computer science that broadly aims to enable computers to "learn" without being directly programmed (1). It has origins in the artificial intelligence movement of the 1950s and emphasizes practical objectives and applications, particularly prediction and optimization. Computers "learn" in machine learning by improving their performance at tasks through "experience" (2, p. xv). In practice, "experience" usually means fitting to data; hence, there is not a clear boundary between machine learning and statistical approaches. Indeed, whether a given methodology is considered "machine learning" or "statistical" often reflects its history as much as genuine differences, and many algorithms (e.g., least absolute shrinkage and selection operator (LASSO), stepwise regression) may or may not be considered machine learning depending on who you ask. Still, despite methodological similarities, machine learning is philosophically and practically distinguishable. At the liberty of (considerable) oversimplification, machine learning generally emphasizes predictive accuracy over hypothesis-driven inference, usually focusing on large, high-dimensional (i.e., having many covariates) data sets (3, 4). Regardless of the precise distinction between approaches, in practice, machine learning offers epidemiologists important tools. In particular, a growing focus on "Big Data" emphasizes problems and data sets for which machine learning algorithms excel while more commonly used statistical approaches struggle. This primer provides a basic introduction to machine learning with the aim of providing readers a foundation for critically reading studies based on these methods and a jumping-off point for those interested in using machine learning techniques in epidemiologic research.


Mathematics in Machine Learning

#artificialintelligence

Machine Learning is a division of AI that focuses on building applications by processing available data accurately. The primary aim of machine learning is to help computers process calculations without human intervention. The question that arises here is, so how do we feed the data to the machine? How would the machine now perform operations on this dataset and provide precise results? This is where mathematics comes into play.


Naive Bayes Classifier Tutorial in Python and Scikit-Learn

#artificialintelligence

Naive Bayes Classifier is a simple model that's usually used in classification problems. Despite being simple, it has shown very good results, outperforming by far other, more complicated models. This is the second article in a series of two about the Naive Bayes Classifier and it will deal with the implementation of the model in Scikit-Learn with Python. For a detailed overview of the math and the principles behind the model, please check the other article: Naive Bayes Classifier Explained. In the previous article linked above, I introduced a table of some data that we can train our classifier on.


20 Data Science Interview Questions for a Beginner

#artificialintelligence

Success is a process not an event. Data Science is growing rapidly in all sectors. With the availability of so many technologies within the Data Science domain, it becomes tricky to crack any Data Science interview. In this article, we have tried to cover the most common Data Science interview questions asked by recruiters. Answer: The question can also be phrased as to why linear regression is not a very effective algorithm.


Difference Between Algorithm and Artificial Intelligence

#artificialintelligence

By 2035 AI could boost average profitability rates by 38 percent and lead to an economic increase of $14 Trillion. The words Artificial Intelligence (AI), and algorithms are most often misused and misunderstood. There are often used interchangeably when they shouldn't be. This leads to unnecessary confusion. In this article, let's understand what AI and algorithms are, and what the difference between them is.


The World of Reality, Causality and Real Artificial Intelligence: Exposing the Great Unknown Unknowns

#artificialintelligence

"All men by nature desire to know." - Aristotle "He who does not know what the world is does not know where he is." - Marcus Aurelius "If I have seen further, it is by standing on the shoulders of giants." "The universe is a giant causal machine. The world is "at the bottom" governed by causal algorithms. Our bodies are causal machines. Our brains and minds are causal AI computers". The 3 biggest unknown unknowns are described and analyzed in terms of human intelligence and machine intelligence. A deep understanding of reality and its causality is to revolutionize the world, its science and technology, AI machines including. The content is the intro of Real AI Project Confidential Report: How to Engineer Man-Machine Superintelligence 2025: AI for Everything and Everyone (AI4EE). It is all a power set of {known, unknown; known unknown}, known knowns, known unknowns, unknown knowns, and unknown unknowns, like as the material universe's material parts: about 4.6% of baryonic matter, about 26.8% of dark matter, and about 68.3% of dark energy. There are a big number of sciences, all sorts and kinds, hard sciences and soft sciences. But what we are still missing is the science of all sciences, the Science of the World as a Whole, thus making it the biggest unknown unknowns. It is what man/AI does not know what it does not know, neither understand, nor aware of its scope and scale, sense and extent. "the universe consists of objects having various qualities and standing in various relationships" (Whitehead, Russell), "the world is the totality of states of affairs" (D. "World of physical objects and events, including, in particular, biological beings; World of mental objects and events; World of objective contents of thought" (K. How the world is still an unknown unknown one could see from the most popular lexical ontology, WordNet,see supplement. The construct of the world is typically missing its essential meaning, "the world as a whole", the world of reality, the ultimate totality of all worlds, universes, and realities, beings, things, and entities, the unified totalities. The world or reality or being or existence is "all that is, has been and will be". Of which the physical universe and cosmos is a key part, as "the totality of space and times and matter and energy, with all causative fundamental interactions".


PCA, LDA, and SVD: Model Tuning Through Feature Reduction for Transportation POI Classification

#artificialintelligence

PCA is a dimension reduction method that takes datasets with a large number of features and reduces them to a few underlying features. The sklearn PCA package performs this process for us. In the snippet of code below we are reducing the 75 features that the initial dataset has into 8 features. This snippet serves to show the optimal number of features for the feature reduction algorithm to fit into. The below snippets will show how to use the Gaussian Naive Bayes, Decision Tree, and the K-Nearest Neighbors Classifiers with the reduced features.


Analyzing Hindu Verses with NLP

#artificialintelligence

'Text Classification' is a Machine Learning technique which is used to analyse text and then organize or categorize them based on patterns or structure. Categorization of text has a lot of applications in the world of artificial intelligence such as news article analysis, hate speech identification, gender classification etc. In this article I use'Text Classification' with Natural Language Processing (NLP) using Python to analyze Hindu religious verses and categorize them. Before we delve deeper into the technical side of Python, let's quickly see what data we will be working with. The'Sahasranama' -- literally 1000 names (where'sahasra' means 1000 and'nama' means names)-- is a hymn of praise offered to God in Hinduism.


Analyzing Hindu Verses with NLP

#artificialintelligence

'Text Classification' is a Machine Learning technique which is used to analyse text and then organize or categorize them based on patterns or structure. Categorization of text has a lot of applications in the world of artificial intelligence such as news article analysis, hate speech identification, gender classification etc. In this article I use'Text Classification' with Natural Language Processing (NLP) using Python to analyze Hindu religious verses and categorize them. Before we delve deeper into the technical side of Python, let's quickly see what data we will be working with. The'Sahasranama' -- literally 1000 names (where'sahasra' means 1000 and'nama' means names)-- is a hymn of praise offered to God in Hinduism.


The Machine Learning Tribes

#artificialintelligence

In his book'The Master Algorithm', Pedro Domingos dissects the ML trains of thought into 5 tribes/groups. Symbolists believe in inductive logic, using data and rules to model intelligent systems. This can be computationally intensive at times, subjected to over-fitting and the'Bias-Variance' paradigm. However, experimentation is the key in most of the methods employed here, for e.g. Connectionists are interested in learning how the brain works and mimic the same using neural networks. Deep learning and mapping the brain (Neurosciences) is endogenous to this tribe.