Goto

Collaborating Authors

data quality


Exploring the ML Tooling Landscape (Part 3 of 3)

#artificialintelligence

The previous blog post in this series considered the current state of the ML tooling ecosystem and how this was reflected in ML adoption in industry. The main takeaway was the widespread use of propriety tooling amongst companies in this field, with a correspondingly diverse and splintered ML tooling market. The post ended by looking at some emerging near-term trends, highlighting the predominance of data observability and related tools, as well as the emergence of MLOps startups. This blog post will pick up from this previous thread to discuss some of the key trends in ML tooling that are likely to dominate in the near future -- or at least ones I want to talk about! As indicated in the previous blog post, I want to focus on MLOps, AutoML, and data-centric AI.


Kinetic Component Analysis

#artificialintelligence

We introduce Kinetic Component Analysis (KCA), a state-space application that extracts the signal from a series of noisy measurements by applying a Kalman Filter on a Taylor expansion of a stochastic process. We show that KCA presents several advantages compared to other popular noise-reduction methods such as Fast Fourier Transform (FFT) or Locally Weighted Scatterplot Smoothing (LOWESS): First, KCA provides band estimates in addition to point estimates. Second, KCA further decomposes the signal in terms of three hidden components, which can be intuitively associated with position, velocity and acceleration. Third, KCA is more robust in forecasting applications. Fourth, KCA is a forward-looking state-space approach, resilient to structural changes.


Top 15 Books to Master Data Strategy - KDnuggets

#artificialintelligence

If you're a data practitioner with your eye on a leadership role, learning Data Management will be an important step toward getting you where you want to go. In this article, we outline 15 books on topics ranging from Data Architecture (highly technical) to Data Literacy (broadly nontechnical) to help you improve your understanding of end-to-end best practices related to data. Summary: I'd be remiss if I didn't begin this list here. This behemoth covers 14 practical topics related to Data Strategy, followed by 3 topics related to implementation. The 14 different knowledge areas are best represented by the Aiken Pyramid, which outlines how these topics build upon each other.


Visually Inspecting Data Profiles for Data Distribution Shifts

#artificialintelligence

The null hypothesis is that the samples are drawn from the same distribution, which means that a low p-value is indicative of different distributions. In this example, we see that drift was detected for all of our features. In addition to statistical tests, there are other approaches you can take to tackle distribution shifts, such as visually inspecting histograms and distribution charts for individual features, which can be useful to confirm the disparity between distributions. In a more general topic, setting rule-based data validation is key in ensuring the quality of your data, which includes distribution changes, be it from external factors or systemic errors such as pipeline errors or missing data. For a more in-depth view on this topic, you can sign up for my upcoming workshop at ODSC Europe this June "Visually Inspecting Data Profiles for Data Distribution Shifts". In the workshop, we will also see how to visually inspect histograms and distribution charts and how to do data validation with whylogs' constraints. We will dig deeper into the concept of distribution shift and explore other popular packages in order to detect data shifts.


Understanding Data Cleaning

#artificialintelligence

Data is information collected through observations. It is often a set of qualitative and quantitative variables or a compilation of both. Data often entered in a system can have multiple layers of issues while retrieving, which in most cases will cause you to clean the data before you can make sense of the same and process the same to come up with actionable insights. Data cleaning is a very crucial first step in any machine learning project. It is an inevitable step in the process of model building and data analysis, but no one really can or tells you how to go about the same.


Machine Learning with Signal Processing Techniques

#artificialintelligence

Stochastic Signal Analysis is a field of science concerned with the processing, modification and analysis of (stochastic) signals. Anyone with a background in Physics or Engineering knows to some degree about signal analysis techniques, what these technique are and how they can be used to analyze, model and classify signals. Data Scientists coming from a different fields, like Computer Science or Statistics, might not be aware of the analytical power these techniques bring with them. In this blog post, we will have a look at how we can use Stochastic Signal Analysis techniques, in combination with traditional Machine Learning Classifiers for accurate classification and modelling of time-series and signals. At the end of the blog-post you should be able understand the various signal-processing techniques which can be used to retrieve features from signals and be able to classify ECG signals (and even identify a person by their ECG signal), predict seizures from EEG signals, classify and identify targets in radar signals, identify patients with neuropathy or myopathyetc from EMG signals by using the FFT, etc etc. In this blog-post we'll discuss the following topics: You might often have come across the words time-series and signals describing datasets and it might not be clear what the exact difference between them is.


The importance of data cleaning

#artificialintelligence

One of the most important initiatives for creating a successful artificial intelligence/machine learning (AI/ML) model is ensuring the data you're using is high quality and clean. That is complete, correct, and relevant to the problem you're trying to solve. Despite the importance of clean data, it can often be overlooked in model creation due to how tedious and time-consuming it can be to review. According to IBM, the lack of clean data, or poor quality data, cost US companies $3.1 trillion in 2016. Accurate models are only built when using clean data.


digital-transformation-trends

#artificialintelligence

The past two years have seen radical digital transformation. Companies and industries that have traditionally been hesitant to adopt new technology suddenly embraced their digital transformations--they needed to find new ways to work. Interestingly, many experts believe that these radical shifts are only the beginning. In a recent Deloitte survey, three-quarters of executives stated that they expect more changes in the next five years than there were in the past five years. The rate of change only increases as organizations are more open and willing to make the changes they need to keep up with the competition. Digital transformation (DX) encourages business organizations to adopt new technologies in order to deliver better value to their customers.


4 Elegant Ways to Deal With Missing Data

#artificialintelligence

Originally published on Towards AI the World's Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses. It's free, we don't spam, and we never share your email address.


For AI model success, utilize MLops and get the data right

#artificialintelligence

We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 - 28. Join AI and data leaders for insightful talks and exciting networking opportunities. It's critical to adopt a data-centric mindset and support it with ML operations Artificial intelligence (AI) in the lab is one thing; in the real world, it's another. Many AI models fail to yield reliable results when deployed. Others start well, but then results erode, leaving their owners frustrated. Many businesses do not get the return on AI they expect. Why do AI models fail and what is the remedy?