Is FAIR data useful for machine learning?
In machine learning, an important balance to maintain is that between adjusting the search space and environment specification (see e.g. If the problem is not well defined the learning outcomes will likely not be useful. However, in practice defining the search space is a large part of the work of a data scientist, and another approach could be to formulate the data requirements for every model version. This would mean starting from the learning goals rather than from the existing data, and describing the requirements for data content, quantity and quality in a systematic way. These requirements can be used to identify or generate the datasets needed to develop the model. The FAIR principles are an excellent starting point to facilitate the creation of this feedback loop between data generators (data entry systems, lab equipment, data processing pipelines, et cetera) and data scientists.
Nov-25-2020, 23:22:13 GMT
- Technology: