Collaborating Authors

Data Quality

GEC -- Grammatical Error Correction


With millions of people trying to move abroad every year, it has become more and more difficult to achieve it. One of the most important skills required for it is good English Communication. Since majority of the people in this category come from countries where English isn't the first language, they are already at a disadvantage. Automated Grammatical Error Correction (GEC) can be an essential and useful tool for millions of people who learn English as a second language. It can either be used to improve their grammatical knowledge or used on a daily basis to communicate with other people efficiently.

Data pre-processing for Machine Learning in Python


Data Preprocessing refers to the steps applied to make data more suitable for data mining. In this course, we are going to focus on pre-processing techniques for machine learning. Pre-processing is the set of manipulations that transform a raw dataset to make it used by a machine learning model. It is necessary for making our data suitable for some machine learning models, to reduce the dimensionality, to better identify the relevant data, and to increase model performance. It's the most important part of a machine learning pipeline and it's strongly able to affect the success of a project.

A Complete Guide to Pyjanitor for Data Cleaning


This article was published as a part of the Data Science Blogathon. As a Machine Learning Engineer or Data Engineer, your main task is to identify and clean duplicate data and remove errors from the dataset. It is good to spend some time preparing the data and making it reliable for the machine learning models. The better the quality of the data, the higher the accuracy of your model and the better the decision-making process. Data Cleaning is not something new in machine learning.

Data Quality Dimensions Are Crucial for AI - DATAVERSITY


As organizations digitize customer journeys, the implications of low-quality data are multiplied manyfold. This is a result of new processes and products that are springing up. Since the data from such processes is growing, data controls may not be strong enough to ensure the data is qualitative. That's where Data Quality dimensions come into play. Increasingly, financial institutions are focusing on data collection management compared to other data stages like consumption, making Data Quality dimensions more important than ever.

How AI & ML transforming data quality management - DataScienceCentral


In recent years technology has become prominent, both at work and at home. Machine learning (ML) and Artificial Intelligence (AI) are evolving quickly today. Almost everyone will have some interaction with a form of AI daily. Some common examples include Siri, Google Maps, Netflix, and Social media (Facebook/Snapchat).AI and ML have popularly used buzzwords right now, often used interchangeably. Most experimentation has been geared to finding specific solutions to specific problems.

Data quality can make or break efforts to bring artificial intelligence to IT operations


AIOps, or artificial intelligence for IT operations, may be just what the doctor ordered for beleaguered IT shops. Applying advanced automation to countless rote IT functions will free up IT departments to concentrate on the bigger and more meaningful things, such as digital transformation and promoting continuous integration and deployment of software. However, there's a problem: AIOps requires the right kind of data at the right time, but much of this data either isn't ready or needs a quality overhaul. While AIOps functions on data points such as system logs and metrics, historical performance, event data, streaming real-time operations events, incident-related data, and ticketing, much of this data may be incomplete or hidden away in silos. In short, if data isn't up to par, AIOps may flop, or worse yet, steer technology decisions in the wrong direction. Enter an emerging methodology on the scene that specifically addresses this, known as robotic data automation, or RDA, as identified in a Forbes piece by Shailesh Manjrekar.

Using AI to extract data from museum specimens


Researchers from Cardiff University are using artificial intelligence (AI) to automatically segment and capture information from museum specimens and perform data quality improvement without human input. The university has been working with museums from across Europe including the Natural History Museum, London. The AI is being used to refine and validate new methods and contribute to the mammoth task of digitizing hundreds of millions of specimens. There are more than 3 billion biological and geological specimens in natural history museums globally. Digitizing these specimens -- where the physical information is transformed into a digital format -- has become a new task for museums as the digital world become ubiquitous. The digitalization helps reduce the amount of manual handling of specimens, which are delicate and prone to damage.

Data Cleaning - Filter


Learn how to filter numbers, words, and just about anything in order to reduce bias in your dataset. Filtering through data is a very common transformation; it takes in a conditional and checks through all the data to keep only the data that meets the condition. By filtering you can improve your machine learning models by training on a specific subset of data to specialize the model, remove incorrect data and outliers, or prune biased features. To start off with what filtering does, it takes in a pile of data and turns it into something smaller and (hopefully) easier to work with. When you create a filter, you start from what will stay, not what will go.

Artificial intelligence to bring museum specimens to the masses


Scientists are using cutting-edge artificial intelligence to help extract complex information from large collections of museum specimens. A team from Cardiff University is using state-of-the-art techniques to automatically segment and capture information from museum specimens and perform important data quality improvement without the need of human input. They have been working with museums from across Europe, including the Natural History Museum, London, to refine and validate their new methods and contribute to the mammoth task of digitizing hundreds of millions of specimens. With more than 3 billion biological and geological specimens curated in natural history museums around the world, the digitization of museum specimens, in which physical information from a particular specimen is transformed into a digital format, has become an increasingly important task for museums as they adapt to an increasingly digital world. A treasure trove of digital information is invaluable for scientists trying to model the past, present and future of organisms and our planet, and could be key to tackling some of the biggest societal challenges our world faces today, from conserving biodiversity and tackling climate change to finding new ways to cope with emerging diseases like COVID-19.