data quality


AI gets down to business

#artificialintelligence

Already, many of the 2017 CIO 100 leaders are piloting AI and machine learning projects, taking a do-it-yourself approach to building predictive models and open platforms, working with consultants, or taking advantage of new AI-infused capabilities increasingly popping up in core enterprise systems like ERP and CRM. While AI isn't exactly a newcomer -- it's been around for at least a couple of decades -- the technology has taken off this year for a number of reasons: Relatively cheap access to cloud-based computing and storage horsepower; unlimited troves of data; and new tools that make it more accessible for mere mortals, not just research scientists, to develop complex algorithms, notes David Schubmehl, research director for cognitive and AI systems at IDC. "It's really the idea that programs or applications can self-program to improve and learn and make recommendations and make predictions." Read ahead to learn how six 2017 CIO 100 leaders are transforming their enterprises to capitalize on AI and machine learning.


Scalable Systems Launches AI/ML Offering Scalable Digital -- MarTechSeries

#artificialintelligence

Scalable Systems launched Scalable Digital to focus on Artificial Intelligence, Machine Learning and Big Data, aiming to solve specific vertical challenges. Artificial Intelligence and Big Data enable organizations to make meaningful, strategic adjustments that minimize costs while maximizing results. In addition, the team has deep experience in horizontal technologies such as Big Data, Machine Learning, Predictive Analytics, and Data Science for Decision Making. Scalable Digital focuses on technology innovation and collaboration with the world's leading technology companies.


Five tips to remember for Machine Learning

#artificialintelligence

It attracts potential analytical talent and offers data scientists a conducive environment to work. SAS Visual Analytics allows all business users--not just data scientists--to embrace machine learning for solving business problems. They include advanced regression techniques, such as penalized regression techniques, generalized additive models, and quantile regression help. Other ways to address these include setting machine learning benchmarks and developing surrogate models, interpretable models used as a proxy to explain complex models.


Machine Learning with Scala on Spark by Jose Quesada

#artificialintelligence

This video was recorded at Scala Days Berlin 2016 follow us on Twitter @ScalaDays or visit our website for more information http://scaladays.org Abstract: What new superpowers does it give me? The machine learning libraries in Apache Spark are an impressive piece of software engineering, and are maturing rapidly. At Data Science Retreat we've taken a real-world dataset and worked through the stages of building a predictive model -- exploration, data cleaning, feature engineering, and model fitting -- in several different frameworks. We'll show what it's like to work with Spark.ml, and compare it to other widely used frameworks (in R and python) along several dimensions: ease of use, productivity, feature set, and performance.


Machine Learning with Scala on Spark by Jose Quesada

#artificialintelligence

This video was recorded at Scala Days Berlin 2016 follow us on Twitter @ScalaDays or visit our website for more information http://scaladays.org Abstract: What new superpowers does it give me? The machine learning libraries in Apache Spark are an impressive piece of software engineering, and are maturing rapidly. At Data Science Retreat we've taken a real-world dataset and worked through the stages of building a predictive model -- exploration, data cleaning, feature engineering, and model fitting -- in several different frameworks. We'll show what it's like to work with Spark.ml, and compare it to other widely used frameworks (in R and python) along several dimensions: ease of use, productivity, feature set, and performance.


Machine Learning and Data Quality

#artificialintelligence

Classic examples are television, where the data concerning the programmes you watch or display and interest in watching can allow the Machine Learning software to identify other shows you would like; and Facebook, where their Machine Learning programme works out which news items appear on your timeline based on your activity and commenting on the site. A basic and fundamental truth concerning Machine Learning is that the best designed computer algorithms and other things Machine Learning can do are only going to be as good as the data the Machine Learning software works with. Ambitious programmers who give their Machine Learning programmes large quantities of Big Data to work with are bound to be disappointed if the Machine Learning appears to have learnt nothing, or writes its own algorithms that then don't work, the cause being in the poor quality of the data, dirty and full of corruptions, mismatches, duplications and other inaccuracies. Spotless Data's unique web-based API solution to dirty data can be built into your Machine Learning software in its design or build phase or you can simply pass your data through our unique web-based API before entering it into the data lake or warehouse where the machine learning computer software will start to work with it, roducing the Machine Learning software and algritjms that will allow your company to stand out among its competitors and attract the lion's share of the pool of potential customers.


5 Free Data Science eBooks For Your Summer Reading List

@machinelearnbot

You will need a basic understanding of statistical concepts and R programming, and the book is intended for practicing Data Scientists but as long as you tick these boxes you should be fine. The book is offered on the Pay-What-You-Want model, including free, and helpfully, they also offer it as a tablet-friendly pdf, also free. Instead of explaining the mathematics and theory, and then showing examples, the authors start with a practical data-related life science challenge. There is also a free Microsoft Excel Practical Data Cleaning template to help you get a good start with your data.


Data Cleansing Tools in Azure Machine Learning

#artificialintelligence

Today, we'll discuss the impact of data cleansing in a Machine Learning model and how it can be achieved in Azure Machine Learning (Azure ML) studio. After running the experiment and creating the scatter plot again (using the clipped amount), the outliers have been removed and the plot looks as follows. To treat null values, the Clean Missing Data module can be used. The module returns a data set that contains the original samples, plus an additional number of synthetic minority samples, depending on the percentage you specify.


How Can Lean Six Sigma Help Machine Learning?

#artificialintelligence

Therefore, both classical statistics and LSS have shown that, if input variables have large variance, we would expect large variance of the output variable(s). In LSS, people would go back to examine the business process to find the source of variance of the input variables in order to eliminate the bias or reduce the variance of those input variables (factors), whereas, in ML, people do not go back to revisit the business process; instead, people in ML only try to correct data errors or eliminate data which do not make sense. Software vendors and data science consulting firms should embrace the variance reduction technique in the data cleansing phase of ML to deliver real value of ML. He has over 18 years of working experience in the areas of advanced analytics, business intelligence, data warehouse, lean six sigma, process optimization, operations analysis, and others.


Machine learning tool cleans dirty data -- GCN

#artificialintelligence

To help keep data -- and the decisions based on it -- clean, researchers at Columbia University and the University of California at Berkeley have developed new software. ActiveClean analyzes prediction models to determine which mistakes (e.g., typos, outliers and missing values) to edit first, updating the models in the process, according to Columbia. To reduce data-cleaning mistakes, ActiveClean takes humans out of the two most error-prone steps of data cleaning: finding dirty data and updating the model. Without data cleaning, a model of this dataset could predict an improper donation 66 percent of the time, while ActiveClean raised that rate to 90 percent after cleaning only 5,000 records, Columbia said.