chemmodlab: A Cheminformatics Modeling Laboratory for Fitting and Assessing Machine Learning Models

Hughes-Oliver, Jeremy R. Ash Jacqueline M.

Jul-11-2018–arXiv.org Machine Learning

It is now commonplace for researchers across a variety of fields to fit machine learning models on complex data to make predictions. The complexity of these data (e.g., large number of features, nonlinear relationships with the response) often means it is difficult to determine a priori what machine learning modeling routine and what descriptors (also known as features, predictors, or covariates) will result in the best performance. A common approach to this problem is to fit many descriptor set and modeling routine (DM) combinations, and then compute measures of prediction performance for held out data to choose a DM combination by assessing relative performance. Often in a particular domain, there are only a few modeling routines that are widely accepted, and researchers tend to use these methods exclusively. Unfortunately, this will not always work well for every data set and researchers might learn from other fields where different modeling methods tend to be more succesful. There are a myraid of modeling methods implemented in R that may be worthwhile for researchers to try (see Hastie et al. (2009) and Kuhn and Johnson (2013) for an overview of these methods). However, the lack of knowledge of the syntactic minutiae and statistical methodology that is required to fit and compare different modeling routines in R often prohibits users from attempting them.

artificial intelligence, machine learning, performance measure, (15 more...)

arXiv.org Machine Learning

Jul-11-2018

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - North Carolina > Wake County
    - Raleigh (0.04)
  - New York > New York County
    - New York City (0.04)

Genre:
- Research Report > Experimental Study (0.94)

Industry:
- Health & Medicine > Pharmaceuticals & Biotechnology (0.49)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Statistical Learning (1.00)
  - Performance Analysis > Accuracy (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found