Regression
Getting Started with Regression in R
Regressions are widely used to estimate relations between variables or predict future values for a certain dataset. If you want to know how much of variable "x" interferes with variable "y" you might want to do a regression in your data. If you have a bunch of data points in time, and you want to know what is your data going to look like in the future, you also might want to do regression. I will try to describe the steps that helped me successfully build linear and non-linear regression in R, using polynomials and splines. I am not going to go on too much details on each method.
16 Uses of AI and Machine Learning in Business
Ocado is using a combination of Google's open source TensorFlow machine learning tools and cloud APIs to support internal AI projects. One such initiative focuses on automating management of the deluge of customer service-related emails the company receives. Ocado is also building a computer vision system in an effort to replace barcode scanning in its warehouses. The machine learning project is still in development but Daniel Nelson, head of data at Ocado's technology division, told Computerworld UK that they hope the technology will help both within its warehouse and delivery processes. Read next: Ocado to replace barcode scanning with AI'vision' to speed packing processes Ocado is also using Google's open sourced TensorFlow deep-learning library for everything from routing algorithms for its robots to move around warehouses, to improving its existing features like demand forecasting, which is currently based on decade-old linear regression models, and predictively suggesting items to add to your basket depending on past shopping habits.
Logistic Regression using python
This article was posted by Arpan Gupta (Indian Institute of Technology). Let's learn from a precise demo on Fitting Logistic Regression on Titanic Data Set for Machine Learning Description:On April 15, 1912, the Titanic sank after colliding with an iceberg, killing 1502 out of 2224 passengers and crew. This tragedy has led to better safety regulations for ships. To see the algorithms, click here.
Predicting the Projected Score of a match at any point using Machine Learning
For the past couple of months, I have been reading and doing a lot of stuff related to machine learning and sports. I have not come up with something that has good accuracy until today. A friend suggested me to predict the projected scores since the projected scores shown on the screen during a cricket match are not very accurate. This post shows that a simple linear regression model can outperform the traditional methods and the accuracy it gives is very good. Right now, the projected scores we get on the screen during a match are not very interesting to people watching cricket.
An experimental study of graph-based semi-supervised classification with additional node information
Lebichot, Bertrand, Saerens, Marco
The volume of data generated by internet and social networks is increasing every day, and there is a clear need for efficient ways of extracting useful information from them. As those data can take different forms, it is important to use all the available data representations for prediction. In this paper, we focus our attention on supervised classification using both regular plain, tabular, data and structural information coming from a network structure. 14 techniques are investigated and compared in this study and can be divided in three classes: the first one uses only the plain data to build a classification model, the second uses only the graph structure and the last uses both information sources. The relative performances in these three cases are investigated. Furthermore, the effect of using a graph embedding and well-known indicators in spatial statistics is also studied. Possible applications are automatic classification of web pages or other linked documents, of people in a social network or of proteins in a biological complex system, to name a few. Based on our comparison, we draw some general conclusions and advices to tackle this particular classification task: some datasets can be better explained by their graph structure (graph-driven), or by their feature set (features-driven). The most efficient methods are discussed in both cases.
Christopher Fonnesbeck - Introduction to Statistical Modeling with Python - PyCon 2017
"Speaker: Christopher Fonnesbeck This intermediate-level tutorial will provide students with hands-on experience applying practical statistical modeling methods on real data. Unlike many introductory statistics courses, we will not be applying ""cookbook"" methods that are easy to teach, but often inapplicable; instead, we will learn some foundational statistical methods that can be applied generally to a wide variety of problems: maximum likelihood, bootstrapping, linear regression, and other modern techniques. The tutorial will start with a short introduction on data manipulation and cleaning using [pandas](http://pandas.pydata.org/), Slightly more advanced topics include bootstrapping (for estimating uncertainty around estimates) and flexible linear regression methods using Bayesian methods. By using and modifying hand-coded implementations of these techniques, students will gain an understanding of how each method works.
Time Series Structure Discovery via Probabilistic Program Synthesis
Schaechtle, Ulrich, Saad, Feras, Radul, Alexey, Mansinghka, Vikash
There is a widespread need for techniques that can discover structure from time series data. Recently introduced techniques such as Automatic Bayesian Covariance Discovery (ABCD) provide a way to find structure within a single time series by searching through a space of covariance kernels that is generated using a simple grammar. While ABCD can identify a broad class of temporal patterns, it is difficult to extend and can be brittle in practice. This paper shows how to extend ABCD by formulating it in terms of probabilistic program synthesis. The key technical ideas are to (i) represent models using abstract syntax trees for a domain-specific probabilistic language, and (ii) represent the time series model prior, likelihood, and search strategy using probabilistic programs in a sufficiently expressive language. The final probabilistic program is written in under 70 lines of probabilistic code in Venture. The paper demonstrates an application to time series clustering that involves a non-parametric extension to ABCD, experiments for interpolation and extrapolation on real-world econometric data, and improvements in accuracy over both non-parametric and standard regression baselines.
How To Interpret R-squared and Goodness-of-Fit in Regression Analysis
This article was written by Jim Frost from Minitab. He came to Minitab with a background in a wide variety of academic research. His role was the "data/stat guy" on research projects that ranged from osteoporosis prevention to quantitative studies of online user behavior. Essentially, his job was to design the appropriate research conditions, accurately generate a vast sea of measurements, and then pull out patterns and meanings from it. After you have fit a linear model using regression analysis, ANOVA, or design of experiments (DOE), you need to determine how well the model fits the data. To help you out, Minitab statistical software presents a variety of goodness-of-fit statistics.