Statistical Learning


3 Python Tools Data Scientists Can Use for Production-Quality Code

#artificialintelligence

For many of these steps, there are no real short cuts to be taken. The only way to build a minimum viable product, for example, is to roll up your sleeves and start coding. However, in a few cases, tools exist to automate tedious manual processes and make your life much easier. In Python, this is the situation for steps 4, 8 and 10, thanks to the unittest, flake8 and sphinx packages. Let's look at each of these packages one by one.


Gentle Approach to Linear Algebra, with Machine Learning Applications

#artificialintelligence

This simple introduction to matrix theory offers a refreshing perspective on the subject. Using a basic concept that leads to a simple formula for the power of a matrix, we see how it can solve time series, Markov chains, linear regression, data reduction, principal components analysis (PCA) and other machine learning problems. These problems are usually solved with more advanced matrix calculus, including eigenvalues, diagonalization, generalized inverse matrices, and other types of matrix normalization. Our approach is more intuitive and thus appealing to professionals who do not have a strong mathematical background, or who have forgotten what they learned in math textbooks. It will also appeal to physicists and engineers.


Unsupervised Learning with Clustering Techniques w/Srini Anand

#artificialintelligence

As humans we are able to discern differences among different groups within a collection. We might group a collection by broad groups such as birds versus plants versus animals or detect subtle features to identify different makes and models of cars. Clustering techniques allow us to automate the process and apply them to data where groupings are not immediately obvious. These techniques are used for different purposes such as detecting market segments, identifying properties of online communities, fraud detection, and cybersecurity. Srini Anand is a Data Scientist at Ameritas Life Insurance Company and holds a Masters degree in Data Science from Indiana University.


Telecom Customer Churn Prediction in Apache Spark (ML)

#artificialintelligence

In this Data science Machine Learning project, we will create Telecom Customer Churn Prediction Project using Classification Model Logistic Regression, Naive Bayes and One-vs-Rest classifier few of the predictive models. Databricks lets you start writing Spark ML code instantly so you can focus on your data problems.


Free Book: Lecture Notes on Machine Learning

#artificialintelligence

Lecture notes for the Statistical Machine Learning course taught at the Department of Information Technology, University of Uppsala (Sweden.) Available as a PDF, here (original) or here (mirror). B.1 A general iterative solution B.2 Commonly used search directions


Machine learning accelerates parameter optimization and uncertainty assessment of a land surface model

#artificialintelligence

The performance of land surface models (LSMs) strongly depends on their unknown parameter variables so that it is necessary to optimize them. Here I present a globally applicable and computationally efficient method for parameter optimization and uncertainty assessment of the LSM by combining Markov Chain Monte Carlo (MCMC) with machine learning. First, I performed the long-term ensemble simulation of the LSM, in which each ensemble member has different parameters' variables, and calculated the gap between simulation and observation, or the cost function, for each ensemble member. Second, I developed the statistical machine learning based surrogate model, which is computationally cheap but accurately mimics the relationship between parameters and the cost function, by applying the Gaussian process regression to learn the model simulation. Third, we applied MCMC by repeatedly driving the surrogate model to get the posterior probabilistic distribution of parameters.


The Ultimate Beginner's Guide to Data Scraping, Cleaning, and Visualization

#artificialintelligence

If you have a model that has acceptable results but isn't amazing, take a look at your data! Taking the time to clean and preprocess your data the right way can make your model a star. In order to look at scraping and preprocessing in more detail, let's look at some of the work that went into "You Are What You Tweet: Detecting Depression in Social Media via Twitter Usage." That way, we can really examine the process of scraping Tweets and then cleaning and preprocessing them. We'll also do a little exploratory visualization, which is an awesome way to get a better sense of what your data looks like!


Adversarial Robustness 360 Toolbox v1.0: A Milestone in AI Security

#artificialintelligence

Next week at AI Research Week, hosted by the MIT-IBM Watson AI Lab in Cambridge, MA, we will publish the first major release of the Adversarial Robustness 360 Toolbox (ART). Initially released in April 2018, ART is an open-source library for adversarial machine learning that provides researchers and developers with state-of-the-art tools to defend and verify AI models against adversarial attacks. ART v1.0 marks a milestone in AI security, introducing new features that extend ART to conventional machine learning models and a variety of data types beyond images: The number of reports on real-world exploitations using adversarial attacks against AI is growing, as in the case of anti-virus software, highlighting the importance of understanding, improving and monitoring the adversarial robustness of AI models. ART provides a comprehensive and growing set of tools to systematically assess and improve the robustness of AI models against adversarial attacks, including evasion and poisoning. In evasion attacks, the adversary crafts small changes to the original input to an AI model in order to influence its behaviour.


Estimating Uncertainty in Machine Learning Models -- Part 1

#artificialintelligence

"We demand rigidly defined areas of doubt and uncertainty!" Let's imagine for a second that we're building a computer vision model for a construction company, ABC Construction. The company is interested in automating its aerial site surveillance process, and would like our algorithm to run on their drones. We happily get to work, and deploy our algorithm onto their fleets of drones, and go home thinking that the project is a great success. A week later, we get a call from ABC Construction saying that the drones keep crashing into the white trucks that they have parked on all their sites.