Helix 1.0: An Open-Source Framework for Reproducible and Interpretable Machine Learning on Tabular Scientific Data

Aguilar-Bejarano, Eduardo, Lea, Daniel, Sivakumar, Karthikeyan, Mase, Jimiama M., Omidvar, Reza, Li, Ruizhe, Kettle, Troy, Mitchell-White, James, Alexander, Morgan R, Winkler, David A, Figueredo, Grazziela

arXiv.org Artificial Intelligence 

The massive increase in data in scientific research requires the development and application of robust tools for data analysis and m achine l earning (ML) that are findable, accessible, interoperable, re usable (FAIR) and interpretable. In domains, such as b iomaterials s cience, e ngineering, c hemistry, h ealthcare and b io sciences, data - driven discovery typically requires interdisciplinary teams . These teams collaborate to implement unbiased data pre - processing strategies, select appropriate modelling techniques, and interpret model outputs to accelerate and inform research outcomes and support rational design and decision - making. This process is often iterative, with experts providing feedback over long periods of time to refine models and optimise the methodology adopted . In cases where initial analysis identifies issues with the data, such as outliers, unbalance d data classes, or experimental measurement uncertainty, another round of data collection and pre - processing might be necessary . That means that data for the same problem are likely to be analysed multiple times using different dataset versions and methodological pipelines. For interdisciplinary co - development of analytic s, there is also a need for tools that allow domain experts to focus on interpreting and using analysis results, rather than developing code . The widespread use of ML and the overwhelming availability of thousands of community - driven open - source packages in Python and R increases the barrier for interoperable and reusable data analysis methodologies . To facilitate accurate analy tics, transparency, and modelling results comparison, there is a strong need for easy - to - use tools that automatically track data, all methodological choices, performance metrics, and corresponding results.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found