Statistical Learning

MIT lets AI "synthesize" computer programs to aid data scientists


How to make artificial intelligence more approachable for ordinary mortals -- that is, people who are neither programmers nor IT admins nor machine learning scientists -- is a topic very much in vogue these days. One approach is to abstract all the complexity by stuffing it in cloud computing operations, as was proposed by one AI startup described recently by ZDNet, Petuum, which aims to "industrialize" AI. Another approach, presented this week by MIT, is to make machine learning do more of the work itself, to invent its own programs to crunch data in specific applications such as time series analysis. This is a hot area of AI in itself, having machines build the models that in turn perform the induction of answers from data. The researchers describe a way to automate the creation of programs that infer patterns in data, which means that a data scientist doesn't need to figure out the "model" that fits the data being studied.

A comprehensive Machine Learning workflow with multiple modelling using caret and caretEnsemble in…


I'll use a very interesting dataset presented in the book Machine Learning with R from Packt Publishing, written by Brett Lantz. My intention is to expand the analysis on this dataset by executing a full supervised machine learning workflow which I've been laying out for some time now in order to help me attack any similar problem with a systematic, methodical approach. If you are thinking this is nothing new, then you're absolutely right! I'm not coming up with anything new here, just making sure I have all the tools necessary to follow a full process without leaving behind any big detail. Hopefully some of you will find it useful too and be sure you are going to find some judgment errors from my part and/or things you would do differently. Feel free to leave me a comment and help me improve! Let's jump ahead and begin to understand what information we are going to work with: "In the field of engineering, it is crucial to have accurate estimates of the performance of building materials. These estimates are required in order to develop safety guidelines governing the materials used in the construction of building, bridges, and roadways. Estimating the strength of concrete is a challenge of particular interest. Although it is used in nearly every construction project, concrete performance varies greatly due to a wide variety of ingredients that interact in complex ways. As a result, it is difficult to accurately predict the strength of the final product. A model that could reliably predict concrete strength given a listing of the composition of the input materials could result in safer construction practices. For this analysis, we will utilize data on the compressive strength of concrete donated to the UCI Machine Learning Data Repository ( by I-Cheng Yeh. According to the website, the concrete dataset contains 1,030 examples of concrete with eight features describing the components used in the mixture. These features are thought to be related to the final compressive strength and they include the amount(in kilograms per cubic meter) of cement, slag, ash, water, superplasticizer, coarse aggregate, and fine aggregate used in the product in addition to the aging time (measured in days)."

Prediction of remaining service life of pavement using an optimized support vector machine (case study of Semnan–Firuzkuh road)


Estimation of the prerequisites for the maintenance, repair, rehabilitation and reconstruction of pavement is one of the requirements for the design and maintenance of the structure of pavement. The pavement design methods are based on providing a proper prediction of the structure of pavement to keep it in permissible condition. The term'remaining service life' (RSL) refers to the time it takes for the pavement to reach an unacceptable status and need to be rehabilitated or reconstructed (Elkins, Thompson, Groerger, Visintine, & Rada, 2013 Elkins, G. E., Thompson, T. M., Groerger, J. L., Visintine, B., & Rada, G. R. (2013). Prediction of the RSL is a basic concept of pavement maintenance planning. Awareness of the future conditions of pavement is a key point in making decisions in the planning of pavement maintenance.

ggeffects 0.8.0 now on CRAN: marginal effects for regression models #rstats


I'm happy to announce that version 0.8.0 of my ggeffects-package is on CRAN now. The update has fixed some bugs from the previous version and comes along with many new features or improvements. One major part that was addressed in the latest version are fixed and improvements for mixed models, especially zero-inflated mixed models (fitted with the glmmTMB-package). In this post, I want to demonstrate the different options to calculate and visualize marginal effects from mixed models. Basically, the type of predictions, i.e. whether to account for the uncertainty of random effects or not, can be set with the type-argument.

What We Can Learn about AI and Creating Smart Products from "The Incredibles"


Nothing strikes terror into the hearts of humans more than the idea of an intelligent robot gone bad. The fear is that a robot can acquire the ability to learn and adapt to the point of superseding their human creators…and with evil intentions. From Gort ("The Day the Earth Stood Still") to Sonny ("I, Robot"), films provide a wide variety of potential robot scenarios. Only a few of these film robots have demonstrated artificial intelligence to the point where they have threatened humankind (like the Cyberdyne Systems series T-800 Model 101 in "Terminator" and the unnamed NS-5 robots in "I, Robot"). However the one evil robot that demonstrated its ability to continuously learn through experimentation and failure would be the Omnidroid from the "The Incredibles".

Machine Learning in Python - PyImageSearch


Struggling to get started with machine learning using Python? In this step-by-step, hands-on tutorial you will learn how to perform machine learning using Python on numerical data and image data. By the time you are finished reading this post, you will be able to get your start in machine learning. To launch your machine learning in Python education, just keep reading! Inside this tutorial, you will learn how to perform machine learning in Python on numerical data and image data. Using this technique you will be able to get your start with machine learning and Python! Along the way, you'll discover popular machine learning algorithms that you can use in your own projects as well, including: This hands-on experience will give you the knowledge (and confidence) you need to apply machine learning in Python to your own projects. Before we can get started with this tutorial you first need to make sure your system is configured for machine learning. Today's code requires the following libraries: In order to help you gain experience performing machine learning in Python, we'll be working with two separate datasets. The first one, the Iris dataset, is the machine learning practitioner's equivalent of "Hello, World!" (likely one of the first pieces of software you wrote when learning how to program). The second dataset, 3-scenes, is an example image dataset I put together -- this dataset will help you gain experience working with image data, and most importantly, learn what techniques work best for numerical/categorical datasets vs. image datasets.

Neural Differential Equations


This won the best paper award at NeurIPS (the biggest AI conference of the year) out of over 4800 other research papers! Neural Ordinary Differential Equations is the official name of the paper and in it the authors introduce a new type of neural network. This new network doesn't have any layers! Its framed as a differential equation, which allows us to use differential equation solvers on it to approximate the underlying function of time series data. Its very cool and will ultimately allow us to learn from irregular time series datasets more efficiently, which applies to many different industries.

Data Dimensionality Reduction in the Age of Machine Learning - DATAVERSITY


Click to learn more about author Rosaria Silipo. Machine Learning is all the rage as companies try to make sense of the mountains of data they are collecting. Data is everywhere and proliferating at unprecedented speed. But, more data is not always better. In fact, large amounts of data can not only considerably slow down the system execution but can sometimes even produce worse performances in Data Analytics applications.

Machine Learning - Predictions with ordinal logistic regression - Michael Fuchs


Now, let's look at the fit on the training data and the corresponding confusion matrix. Our model performs only marginally better on the training data than our baseline model. We can see why this is the case: it predicts the average class (1) very often. Now we want to try this again with the test set. As you can see, we get a pretty much identical situation.