Mathematical & Statistical Methods
A Simple Introduction to Complex Stochastic Processes
Stochastic processes have many applications, including in finance and physics. It is an interesting model to represent many phenomena. Unfortunately the theory behind it is very difficult, making it accessible to a few'elite' data scientists, and not popular in business contexts. One of the most simple examples is a random walk, and indeed easy to understand with no mathematical background. However, time-continuous stochastic processes are always defined and studied using advanced and abstract mathematical tools such as measure theory, martingales, and filtration.
A Simple Introduction to Complex Stochastic Processes
Stochastic processes have many applications, including in finance and physics. It is an interesting model to represent many phenomena. Unfortunately the theory behind it is very difficult, making it accessible to a few'elite' data scientists, and not popular in business contexts. One of the most simple examples is a random walk, and indeed easy to understand with no mathematical background. However, time-continuous stochastic processes are always defined and studied using advanced and abstract mathematical tools such as measure theory, martingales, and filtration.
Cancer Genomics Neural Networks vs k-NN Classifiers
Get your team access to Udemy's top 2,000 courses anytime, anywhere. Cancer Genomics Neural Networks vs k-NN Classifiers: Machine Learning for Python Hackers is a crash course in Data Science and Cancer Genomics for anyone interested in cancer research. The course starts out with loading up a cancer dataset to split train and test. This course is unique in Data Science in that it uses the mglearn library for better visualization and is dedicated to providing details as such so the student can follow along with no ambiguity.
Exponential convergence of testing error for stochastic gradient methods
Pillaud-Vivien, Loucas, Rudi, Alessandro, Bach, Francis
Stochastic gradient methods are now ubiquitous in machine learning, both from the practical side, as a simple algorithm that can learn from a single or a few passes over the data [1], and from the theoretical side, as it leads to optimal rates for estimation problems in a variety of situations [2, 3]. They follow a simple principle [4]: to find a minimizer of a function F defined on a vector space from noisy gradients, simply follow the negative stochastic gradient and the algorithm will converge to a stationary point, local minimum, global minimum of F (depending on the properties of the function F), with a rate of convergence that decays with the number of gradient steps n typically as O(1/ n), or O(1/n) depending on the assumptions which are made on the problem (see, e.g., [3, 5, 6, 7, 8, 9, 10, 11]).
In Defense of the Indefensible: A Very Naive Approach to High-Dimensional Inference
Zhao, Sen, Shojaie, Ali, Witten, Daniela
In recent years, a great deal of interest has focused on conducting inference on the parameters in a linear model in the high-dimensional setting. In this paper, we consider a simple and very na\"{i}ve two-step procedure for this task, in which we (i) fit a lasso model in order to obtain a subset of the variables; and (ii) fit a least squares model on the lasso-selected set. Conventional statistical wisdom tells us that we cannot make use of the standard statistical inference tools for the resulting least squares model (such as confidence intervals and $p$-values), since we peeked at the data twice: once in running the lasso, and again in fitting the least squares model. However, in this paper, we show that under a certain set of assumptions, with high probability, the set of variables selected by the lasso is deterministic. Consequently, the na\"{i}ve two-step approach can yield confidence intervals that have asymptotically correct coverage, as well as p-values with proper Type-I error control. Furthermore, this two-step approach unifies two existing camps of work on high-dimensional inference: one camp has focused on inference based on a sub-model selected by the lasso, and the other has focused on inference using a debiased version of the lasso estimator.
Prerequisites of linear algebra for machine learning
Just about everyone has watched animated movies such as Frozen or Big Hero 6 or has at least heard about 3D computer games. It seems more fun to enjoy the movies and games rather than reading a Linear Algebra book. But it is because of linear algebra that we are able to watch a character move on the screen. Linear Algebra is the motivation of our new digital world. Through this article, we will learn matrix arithmetic and learn how to use numpy to carry out these operations in python.
Why You Should Forget 'for-loop' for Data Science Code and Embrace Vectorization
We all have used for-loops for majority of the tasks which needs an iteration over a long list of elements. I am sure almost everybody, who is reading this article, wrote their first code for matrix or vector multiplication using a for-loop back in high-school or college. For-loop has served programming community long and steady. However, it comes with some baggage and is often slow in execution when it comes to processing large data sets (many millions of records as in this age of Big Data). This is particularly true for interpreted language like Python, where, if the body of your loop is simple, the interpreter overhead of the loop itself can be a substantial amount of the overhead.