Across industries, Big Data and Artificial Intelligence (AI) have proven to be powerful tools when it comes to informing companies about their target customers. Gartner predicts that by 2019, more than 50% of organizations will redirect their investments to customer experience innovations. As a result, many organizations have built teams to collect and analyze data on every step of the customer journey – taking into account where, why and how customers interact with their channels. By analyzing this data in real time, companies are able to keep up with evolving customer demands. Dissecting every interaction to understand what drives customer behavior may seem like a gargantuan task for many.
We expect the landscape to be an integrated edge-to-core-to-cloud solution enabling what today is called IoT, Big Data, Fast Data and AI. Each time a promising new technology emerges, we seem to go through a period where it is proposed to be the solution to everything--until we reconcile how that technology fits into the bigger picture. Such is the case with artificial intelligence (AI). Clearly the advancements in deep learning will create new classes of solutions but rather than being a standalone solution, we are just now beginning to see how it fits into our IT landscape. AI emerges at a time when several other shifts in analytics technology are occurring.
May Masoud is a Solution Specialist at SAS Canada, as part of the Data Sciences team. Leveraging her analytics background, she helps businesses visualize the potential of their data, and surface insights using modern data mining and machine learning techniques. With a Master of Business Analytics following a Bachelor in Statistics & Economics, May aims to create value at every step of the analytics lifecycle: data discovery, model build, model deployment, and business strategy. She has touched the analytics landscape in a variety of industries, whether it is oil production models for the energy sector or solving churn problems in the telecom industry. May's aim is to ubiquitize self-serve analytics and enable citizen data scientists.
This Web page is aimed at shedding some light on the perennial R-vs.-Python debates in the Data Science community. As a professional computer scientist and statistician, I hope to shed some useful light on the topic. I have potential bias: I've written four R-related books, I've given a keynote talk at useR!; I currently serve as Editor-in-Chief of the R Journal; etc. But I hope this analysis will be considered fair and helpful. This is subjective, of course, but having written (and taught) in many different programming languages, I really appreciate Python's greatly reduced use of parentheses and braces: This is of particular interest to me, as an educator.
Three years ago, if you told me that one day I would use python to analyze AI policy and make Guido van Rossum chuckle, I would think you are crazy. Three years later at PyCon 2019 in Cleveland, that's exactly what happened. I was by no means a tech person. I was trained as an economist (read: stats nerd), but somehow for the past three years I've been writing analysis on deep-tech fields including AI and 5G. What I hope to achieve with this post is not #humblebrag (ok, maybe a little happy dance) but to share with you all the struggles I had and am still experiencing on a daily basis and to reassure a fellow researcher somewhere feeling that he/she is faking it all the time, you are not alone.
Unstructured data will not only improve accuracy but achieve fundamentally new ways of thinking, communicating and using information. The process of making artificial intelligence (AI) systems interact more like humans makes some people uncomfortable, but AI is not about replacing humans. In reality, it is much more about removing the robot from humans. A big part of AI's value lies in automating manual processes and analyzing vast amounts of data quickly so that humans are free to accomplish higher-order tasks that require reason and judgment. To get to this point, however, AI systems must be able to communicate with users and analyze natural forms of data (aka unstructured data) -- all of the free-flowing stuff that is unable to be packaged in a neat way, things like voice, images and text.
Decisions made by complex algorithms impact all areas of our lives: the ads we see, the social status updates we read, the medications we are prescribed, how much an insurance policy will cost and whether or not we get a mortgage for a new home. Automated decisions help us cope with our fast-paced lifestyles. They are quick, they may feel relevant and they can be convenient. For example, while shopping on Amazon, you get suggestions for similar products that an algorithm has chosen based on other peoples' purchasing habits. Most of us aren't thinking about why we're shown some products instead of others.
First, before I start, I want to say something about what that is, or what I understand from this. So, here is one interpretation. It is about using data, obviously. So, it has relationships to analytics and data science, and it is, obviously, part of AI in some way. This is my little taxonomy, how I see things linking together. You have computer science, and that has subfields like AI, software engineering, and machine learning is typically considered to be subfield of AI, but a lot of principles of software engineering apply in this area. This is what I want to talk about today. It's heavily used in data science. So, the difference between AI and data science is somewhat fluid if you like, but data science tries to understand what's in data and tries to understand questions about data. But then it tries to use this to make decisions, and then we are back at AI, artificial intelligence, where it's mostly about automating decision making. We have a couple of definitions. AI means using intelligence, making machines intelligent, and that means you can somehow function appropriate in an environment with foresight. Machine learning is a field that looks for algorithms that can automatically improve their performance without explicit programming, but by observing relevant data. And yes, I've thrown in data science as well for good measure, the scientific process of turning data into insight for making better decisions. If you have opened any newspaper, you must have seen the discussion around the ethical dimensions of artificial intelligence, machine learning or data science. Testing touches on that as well because there are quite a few problems in that space, and I'm just listing two here. So, you use data, obviously, to do machine learning. Where does this data come from, and are you allowed to use it? Do you violate any privacy laws, or are you building models that you use to make decisions about people? If you do that, then the general data protection regulation in the EU says you have to be able to explain to an individual if you're making a decision based on an algorithm or a machine, if this decision is of any kind of significant impact. That means, in machine learning, a lot of models are already out of the door because you can't do that. You can't explain why a certain decision comes out of a machine learning model if you use particular models.
A data scientist is a "person who is better at statistics than any software engineer and better at software engineering than any statistician". In Top 10 Coding Mistakes Made by Data Scientists we discussed how statisticians can become a better coders. Here we discuss how coders can become better statisticians. Detailed output and code for each of the examples is available in github and in an interactive notebook. The code uses data workflow management library d6tflow and data is shared with dataset management library d6tpipe.