Goto

Collaborating Authors

 garbage data


Avoiding Garbage in Machine Learning

#artificialintelligence

Anyone who works with artificial intelligence (AI) knows that the quality of the data goes a long way toward determining the quality of the result. But "garbage" is a broad and expanding category in data science – poorly labeled or inaccurate data, data that reflects underlying human prejudices, incomplete data. To paraphrase Tolstoy, great datasets are all alike, but all garbage datasets are garbage in their own, unique and horrible ways. People believe in machine learning. Israeli philosopher and historian Yuval Noah Harrari coined the term "dataism" to describe a blind faith in algorithms. This faith extends beyond machine learning's ability to analyze data.


Avoiding Garbage in Machine Learning

#artificialintelligence

Anyone who works with artificial intelligence (AI) knows that the quality of the data goes a long way toward determining the quality of the result. But "garbage" is a broad and expanding category in data science – poorly labeled or inaccurate data, data that reflects underlying human prejudices, incomplete data. To paraphrase Tolstoy, great datasets are all alike, but all garbage datasets are garbage in their own, unique and horrible ways. People believe in machine learning. Israeli philosopher and historian Yuval Noah Harrari coined the term "dataism" to describe a blind faith in algorithms. This faith extends beyond machine learning's ability to analyze data.


How do you guys handle 'garbage data' discovered during ETL? • r/Database

#artificialintelligence

Hopefully my title isn't too poorly worded.. I am currently upgrading a client's old transaction-based DB to something a bit more modern that locks down their flow a bit better so problems like this hopefully don't arise in the future. To give a brief overview, they use this to track hours on tubes and capacitors used in transmitters to calculate average lifespans and perform other calculations. Devices are tied to meters whose readings are updated daily. I've got the old data transformed and loaded into the new system, but running through some basic sanity checks I'm finding there is quite a bit of data that simply doesn't make sense. On a very basic level, there are transactions with IN Dates that are higher values than OUT Dates.