These five points are important, of course, but apart from all that, if we don't have any data, we then don't have any project. If you don't know how to get it then you have nothing. And beyond this, in my previous article, I highlighted some important questions you should make yourself about source, format and necessary actions to make yourself with the data. Now…what if I told you, there's a source where you can search for thousands of datasets, and datasets only, from all around the world, in several formats and easy for you to discover and access? And who else than Google to make available something like this? Welcome to Dataset Search…the -not that new, but still in beta- Google's search engine ONLY for datasets.
Data is the new oil and now Google is providing Dataset Search. Datasets are needed to train Machine Learning and for other computer projects. Data is a vital resource for the modern age and now Datasets are easily published and discovered. Dataset Search has indexed almost 25 million of these datasets, giving you a single place to search for datasets and find links to where the data is. You can now filter the results based on the types of dataset that you want (e.g., tables, images, text), or whether the dataset is available for free from the provider.
Based on what we've learned from the early adopters of Dataset Search, we've added new features. You can now filter the results based on the types of dataset that you want (e.g., tables, images, text), or whether the dataset is available for free from the provider. If a dataset is about a geographic area, you can see the map. Plus, the product is now available on mobile and we've significantly improved the quality of dataset descriptions. One thing hasn't changed however: anybody who publishes data can make their datasets discoverable in Dataset Search by using an open standard (schema.org) to describe the properties of their dataset on their own web page.
Similar to how Google Scholar works, Dataset Search lets you find datasets wherever they're hosted, whether it's a publisher's site, a digital library, or an author's personal web page. To create Dataset search, we developed guidelines for dataset providers to describe their data in a way that Google (and other search engines) can better understand the content of their pages. These guidelines include salient information about datasets: who created the dataset, when it was published, how the data was collected, what the terms are for using the data, etc. We then collect and link this information, analyze where different versions of the same dataset might be, and find publications that may be describing or discussing the dataset. Our approach is based on an open standard for describing this information (schema.org)