Collaborating Authors


Establish AI Governance, Not Best Intentions, to Keep Companies Honest - InformationWeek


IBM, Microsoft and Amazon all recently announced they are either halting or pausing facial recognition technology initiatives. IBM even launched the Notre Dame-IBM Tech Ethics Lab, "a'convening sandbox' for affiliated scholars and industry leaders to explore and evaluate ethical frameworks and ideas." In my view, the governance that will yield ethical artificial intelligence (AI) -- specifically, unbiased decisioning based on AI -- won't spring from an academic sandbox. AI governance is a board-level issue. Boards of directors should care about AI governance because AI technology makes decisions that profoundly affect everyone.

How to Go Beyond an Ordinary Data Scientist


Suppose you are the hiring manager for a data scientist position, and interviewing a prospective candidate. The candidate starts to express the skills hoping they are enough for the position and the best card among these skills is MS Excel capability. What would you think about this candidate? I suppose most of you would consider this candidate as mediocre, which is ineligible for most of the companies. Let's make a little change in our hypothetical interview by replacing MS Excel with predictive modelling.

Global Big Data Conference


TensorFlow has become the most popular tool and framework for machine learning in a short span of time. It enjoys tremendous popularity among ML engineers and developers. According to the Hacker News Hiring Trends, May 2020, TensorFlow jobs are in great demand. Here are five reasons behind TensorFlow's popularity: TensorFlow is the only framework available for running machine learning models from the cloud to the tiniest microcontroller device. Models trained with TensorFlow can be optimized for CPU and GPU.

The Anatomy of AI: Understanding Data Processing Tasks


But as your data scientists and data engineers quickly realize, building a production AI system is a lot easier said than done, and there are many steps to master before you get that ML magic. At a high level, the anatomy of AI is fairly simple. You start with some data, train a machine learning model upon it, and then position the model to infer on real-world data. Unfortunately, as the old saying goes, the devil is in the details. And in the case of AI, there are a lot of small details you have to get right before you can claim victory.

Climate Researchers Enlist Big Cloud Providers for Big Data Challenges WSJD - Technology

And the shift hasn't gone unnoticed by the Big Three cloud providers. AWS and others offer subscription-based remote data storage and online tools, and researchers say they can be an affordable alternative to setting up and maintaining their own hardware. The cloud's added computing power can also make it easier for researchers to run machine-learning algorithms designed to identify patterns and extract insights from vast amounts of climate data, for instance, on ocean temperatures and rainfall patterns, as well as decades' worth of satellite imagery. "The data sets are getting larger and larger," said Werner Vogels, chief technology officer of Inc. "So machine learning starts to play a more important role to look for patterns in the data."

My 5 Favorite Data Science Portfolios · Learning With Data


At the end of the article, I posted a link to an example portfolio that I liked by Tim Dettmers. Afterward, I had a few people ask me to compile a larger list of great data science portfolios and projects. While not a portfolio, but rather a project, I think this is a great format to try and exemplify. Melissa Runfeldt did a great job defining and motivating her problem, discussing how she gathered data and explaining her methods with images of results. All in a way that would be easy for a non-technical person to follow (at least at a high level).

4 ways Data Scientists fool us


The idea of analyzing data for decision making has been around for many years, but the popularity of data science has exploded along with the FAANG companies' growth in recent years. No matter your job title, experience level, or industry, I am confident that you will encounter solutions or products that are highly'data-driven' or powered by Artificial Intelligenceᵗᵐ. Here are the Top 4 methods used by data scientists to fool others. As a Machine-Learning researcher and practitioner, I have made these'mistakes' myself in the past, sometimes even unknowingly! "Our model achieves an accuracy of 98.9%"

Review: Kinetica analyzes billions of rows in real time


In 2009, the future founders of Kinetica came up empty when trying to find an existing database that could give the United States Army Intelligence and Security Command (INSCOM) at Fort Belvoir (Virginia) the ability to track millions of different signals in real time to evaluate national security threats. So they built a new database from the ground up, centered on massive parallelization combining the power of the GPU and CPU to explore and visualize data in space and time. By 2014 they were attracting other customers, and in 2016 they incorporated as Kinetica. The current version of this database is the heart of Kinetica 7, now expanded in scope to be the Kinetica Active Analytics Platform. The platform combines historical and streaming data analytics, location intelligence, and machine learning in a high-performance, cloud-ready package.

Applications of Differential Privacy to European Privacy Law (GDPR) and Machine Learning


Differential privacy is a data anonymization technique that's used by major technology companies such as Apple and Google. The goal of differential privacy is simple: allow data analysts to build accurate models without sacrificing the privacy of the individual data points. But what does "sacrificing the privacy of the data points" mean? Well, let's think about an example. Suppose I have a dataset that contains information (age, gender, treatment, marriage status, other medical conditions, etc.) about every person who was treated for breast cancer at Hospital X.

An Ultimate Guide to Time Series Analysis in Pandas


It is the analysis of the dataset that has a sequence of time stamps. It has become more and more important with the increasing emphasis on machine learning. So many different types of industries use time-series data now for time series forecasting, seasonality analysis, finding trends, and making important business and research decisions. So it is very important as a data scientist or data analyst to understand the time series data clearly. I will start with some general functions and show some more topics using the Facebook Stock price dataset. Time series data can come in with so many different formats. But not all of those formats are friendly to python's pandas' library.