"ML-Everything"? Balancing Quantity and Quality in Machine Learning Methods for Science
Recent research in machine learning (ML) has led to significant progress in various fields, including scientific applications. However, there are limitations that need to be addressed to ensure the validity of new models, the quality of testing and validation procedures, and the actual applicability of the developed models to real-world problems. These limitations include unfair, subjective, and unbalanced evaluations, not necessarily intentional yet there, the use of datasets that don't properly reflect real-world use cases (for example that are "too easy"), incorrect ways to split datasets into training, testing, and validation subsets, etc. In this article I will discuss all these points, using examples from the domain of biology which is being revolutionized by ML methodologies. Along the way I will also briefly touch on the interpretability of ML models, which is today very limited but very important because it could help clarify many of the aspects discussed in the first part of the article regarding the limitations that need to be addressed.
Mar-14-2023, 06:25:19 GMT
- Industry:
- Technology: