Machine Learning's Poor Fit for Real Data

#artificialintelligence

There's a growing sentiment out there with all the wonderful things happening in artificial intelligence, machine learning, and data science that these technologies are ready to solve all the things (including how to kill all humans). The reality is there are still a bunch of significant hurdles between us and the AI dystopia/utopia. One big one that is the main impetus behind my research is the disconnect between the statistical foundations of machine learning and how real data works. Machine learning technology is built on a foundation of formal theory. Statistical ideas, computer science algorithms, and information-theoretic concepts integrate to yield practical methods that analyze large, noisy data sets to train actionable and predictive models.


From Data Analysis to Machine Learning

#artificialintelligence

This article was originally posted here, by Mubashir Qasim. In my last article, I stated that for practitioners (as opposed to theorists), the real prerequisite for machine learning is data analysis, not math. One of the main reasons for making this statement, is that data scientists spend an inordinate amount of time on data analysis. The traditional statement is that data scientists "spend 80% of their time on data preparation." While I think that this statement is essentially correct, a more precise statement is that you'll spend 80% of your time on getting data, cleaning data, aggregating data, reshaping data, and exploring data using exploratory data analysis and data visualization.


How to use data analysis for machine learning (example, part 1) - SHARP SIGHT LABS

#artificialintelligence

In my last article, I stated that for practitioners (as opposed to theorists), the real prerequisite for machine learning is data analysis, not math. One of the main reasons for making this statement, is that data scientists spend an inordinate amount of time on data analysis. The traditional statement is that data scientists "spend 80% of their time on data preparation." While I think that this statement is essentially correct, a more precise statement is that you'll spend 80% of your time on getting data, cleaning data, aggregating data, reshaping data, and exploring data using exploratory data analysis and data visualization. And ultimately, the importance of data analysis applies not only to data science generally, but machine learning specifically.


For data work, "It's actually pretty hard to argue *against* using Python"

#artificialintelligence

I wrote my first Python program in 1996, and my most recent a couple of weeks ago, so I can appreciate Python's advance to cover a very broad range of computing tasks. I don't program much anymore, but in my work over the years -- and yours too, if you do much coding -- data manipulation has always played an important role. You can't build and apply analytical models, manage transactions, craft a Web experience, or carry out any other significant task without investing time and attention to data acquisition, cleansing, and structuring. Python is ideal for those tasks, and then for model building and data analysis. Python is great for natural language processing (NLP), in particular, a special interest of mine, and for just about any data work that interests you, chances are.