Note: Google's new dataset search tool was publicly released on January 23rd, 2020. Google recently released datasetsearch, a free tool for searching 25 million publicly available datasets. The search tool includes filters to limit results based on their license (free or paid), format (csv, images, etc), and update time. The results also include descriptions of the dataset's contents as well as author citations. Google's dataset aggregation methodology differs from other dataset repositories like Amazon's open data registry.
Data is the new oil and now Google is providing Dataset Search. Datasets are needed to train Machine Learning and for other computer projects. Data is a vital resource for the modern age and now Datasets are easily published and discovered. Dataset Search has indexed almost 25 million of these datasets, giving you a single place to search for datasets and find links to where the data is. You can now filter the results based on the types of dataset that you want (e.g., tables, images, text), or whether the dataset is available for free from the provider.
Similar to how Google Scholar works, Dataset Search lets you find datasets wherever they're hosted, whether it's a publisher's site, a digital library, or an author's personal web page. To create Dataset search, we developed guidelines for dataset providers to describe their data in a way that Google (and other search engines) can better understand the content of their pages. These guidelines include salient information about datasets: who created the dataset, when it was published, how the data was collected, what the terms are for using the data, etc. We then collect and link this information, analyze where different versions of the same dataset might be, and find publications that may be describing or discussing the dataset. Our approach is based on an open standard for describing this information (schema.org)
According to Google AI Blog there are tens of millions of datasets on the web, with content ranging from sensor data and government records, to results of scientific experiments and business reports. Indeed, there are datasets for almost anything one can imagine, be it diets of emperor penguins or where remote workers live. More than two years ago, we undertook an effort to design a search engine that would provide a single entry point to these millions of datasets and thousands of repositories. The result is Dataset Search, which we launched in beta in 2018 and fully launched in January 2020. In addition to facilitating access to data, Dataset Search reconciles and indexes datasets using the metadata descriptions that come directly from the dataset web pages using schema.org
These five points are important, of course, but apart from all that, if we don't have any data, we then don't have any project. If you don't know how to get it then you have nothing. And beyond this, in my previous article, I highlighted some important questions you should make yourself about source, format and necessary actions to make yourself with the data. Now…what if I told you, there's a source where you can search for thousands of datasets, and datasets only, from all around the world, in several formats and easy for you to discover and access? And who else than Google to make available something like this? Welcome to Dataset Search…the -not that new, but still in beta- Google's search engine ONLY for datasets.