Moving towards reproducible machine learning - Nature Computational Science
An important step when constructing a model is the collection and selection of the datasets, as the quality of the model greatly depends on the quality and characteristics of the data. The data collection process needs to be properly discussed and reported, as there can be biases (intentional and/or unintentional) with regards to the selected data sources. Any identified biases and attempts to mitigate them should also be properly discussed, so that other researchers can be aware of the limitations when using the reported models. If synthetic data is used, the data generation process, including any assumptions that are considered, needs to be described in detail. Raw datasets are in fact rarely used, since they may have several inconsistencies, errors, and outliers that can ultimately impact the quality of the model. In addition, data might need to be converted to a specific format and representation in order to be used for a specific model.
Oct-12-2021, 21:40:27 GMT
- Technology: