AITopics

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Folie, Brendan, Hutchinson, Maxwell

Multivariate Prediction Intervals for Random Forests

arXiv.org Machine LearningMay-19-2022

Accurate uncertainty estimates can significantly improve the performance of iterative design of experiments, as in Sequential and Reinforcement learning. For many such problems in engineering and the physical sciences, the design task depends on multiple correlated model outputs as objectives and/or constraints. To better solve these problems, we propose a recalibrated bootstrap method to generate multivariate prediction intervals for bagged models and show that it is well-calibrated. We apply the recalibrated bootstrap to a simulated sequential learning problem with multiple objectives and show that it leads to a marked decrease in the number of iterations required to find a satisfactory candidate. This indicates that the recalibrated bootstrap could be a valuable tool for practitioners using machine learning to optimize systems with multiple competing targets.

artificial intelligence, decision tree learning, multivariate prediction interval, (2 more...)

2205.0226

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.40)

#artificialintelligenceMay-13-2022, 15:30:08 GMT

Hyperparameter Tuning of Decision Tree Classifier Using GridSearchCV

The models can have many hyperparameters and finding the best combination of the parameter using grid search methods. Grid search is a technique for tuning hyperparameter that may facilitate build a model and evaluate a model for every combination of algorithms parameters per grid. We might use 10 fold cross-validation to search the best value for that tuning hyperparameter. These values are called hyperparameters. To get the simplest set of hyperparameters we will use the Grid Search method.

gridsearchcv, hyperparameter, parameter value, (11 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.46)

arXiv.org Machine LearningMay-5-2022

Performance and Interpretability Comparisons of Supervised Machine Learning Algorithms: An Empirical Study

Liu, Alice J., Mukherjee, Arpita, Hu, Linwei, Chen, Jie, Nair, Vijayan N.

This paper compares the performances of three supervised machine learning algorithms in terms of predictive ability and model interpretation on structured or tabular data. The algorithms considered were scikit-learn implementations of extreme gradient boosting machines (XGB) and random forests (RFs), and feedforward neural networks (FFNNs) from TensorFlow. The paper is organized in a findings-based manner, with each section providing general conclusions supported by empirical results from simulation studies that cover a wide range of model complexity and correlation structures among predictors. We considered both continuous and binary responses of different sample sizes. Overall, XGB and FFNNs were competitive, with FFNNs showing better performance in smooth models and tree-based boosting algorithms performing better in non-smooth models. This conclusion held generally for predictive performance, identification of important variables, and determining correct input-output relationships as measured by partial dependence plots (PDPs). FFNNs generally had less over-fitting, as measured by the difference in performance between training and testing datasets. However, the difference with XGB was often small. RFs did not perform well in general, confirming the findings in the literature. All models exhibited different degrees of bias seen in PDPs, but the bias was especially problematic for RFs. The extent of the biases varied with correlation among predictors, response type, and data set sample size. In general, tree-based models tended to over-regularize the fitted model in the tails of predictor distributions. Finally, as to be expected, performances were better for continuous responses compared to binary data and with larger samples.

artificial intelligence, decision tree learning, machine learning, (19 more...)

2204.12868

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

#artificialintelligenceApr-29-2022, 20:45:26 GMT

Pruned Random Forests for Effective and Efficient Financial Data Analytics

It is evident that Machine Learning (ML) has touched all walks of our lives! From checking the weather forecast to applying for a loan or a credit card, ML is used in almost every aspect of our daily life. In this chapter, ML is explored in terms of algorithms and applications. Special consideration is given to ML applications in the financial data analytics domain including stock market analysis, fraud detection in financial transactions, credit risk analysis, loan defaulting rate analysis, and profit–loss analysis. The chapter establishes the significance of Random Forests as an effective machine learning method for a wide variety of financial applications.

efficient financial data analytic, pruned random forest, random forest, (6 more...)

Industry: Banking & Finance > Credit (0.63)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.84)

arXiv.org Machine LearningApr-29-2022

A study of tree-based methods and their combination

Zeng, Yinuo

With the increase of data volume and the continuous development in deep learning, although more and more traditional machine learning techniques are outperformed by artificial neural networks, tree-based methods are still popular. Random forest (Breiman, 2001) is commonly used as a benchmark to evaluate the performance of nonparametric models, while XGBoost (Chen and Guestrin, 2016) performs well in Kaggle competitions and often competes with artificial neural networks. Also, instead of relying on a specific method, people prefer to make decisions based on a combination of multiple models, which shows a better performance than a single one. Therefore, identifying the importance of each model by weights assignment is critical.

artificial intelligence, machine learning, penalty, (16 more...)

2204.13916

Country: North America > United States > California (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.92)

#artificialintelligenceApr-27-2022, 06:50:42 GMT

Raising Survey Response Rates by Using Machine Learning to Predict Gold Providers

The model based response propensity approach used a machine learning method called the random forests with regression trees method.

machine learning, predict gold provider, survey response rate

Genre: Questionnaire & Opinion Survey (0.85)

Industry: Media > News (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

#artificialintelligenceApr-26-2022, 06:43:40 GMT

Introduction to Random Forest Algorithm

Random Forest is a supervised machine learning algorithm that is composed of individual decision trees. This type of model is called an ensemble model because an "ensemble" of independent models is used to compute a result. The basis for the Random Forest is formed by many individual decision trees, the so-called Decision Trees. A tree consists of different decision levels and branches, which are used to classify data. The Decision Tree algorithm tries to divide the training data into different classes so that the objects within a class are as similar as possible and the objects of different classes are as different as possible. This tree helps to decide whether to do sports outside or not, depending on the weather variables "weather", "humidity" and "wind force".

decision tree, random forest, training data, (11 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

arXiv.org Machine LearningApr-26-2022

Identification of feasible pathway information for c-di-GMP binding proteins in cellulose production

Hassan, Syeda Sakira, Mangayil, Rahul, Aho, Tommi, Yli-Harja, Olli, Karp, Matti

In this paper, we utilize a machine learning approach to identify the significant pathways for c-di-GMP signaling proteins. The dataset involves gene counts from 12 pathways and 5 essential c-di-GMP binding domains for 1024 bacterial genomes. Two novel approaches, Least absolute shrinkage and selection operator (Lasso) and Random forests, have been applied for analyzing and modeling the dataset. Both approaches show that bacterial chemotaxis is the most essential pathway for c-di-GMP encoding domains. Though popular for feature selection, the strong regularization of Lasso method fails to associate any pathway to MshE domain. Results from the analysis may help to understand and emphasize the supporting pathways involved in bacterial cellulose production. These findings demonstrate the need for a chassis to restrict the behavior or functionality by deactivating the selective pathways in cellulose production.

artificial intelligence, machine learning, pathway, (15 more...)

2204.12526

Country: Europe > Finland > Pirkanmaa > Tampere (0.06)

Genre: Research Report > New Finding (0.50)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.43)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.42)

Formentini, Sarah Elizabeth, Liang, Wei, Zhu, Ruoqing

Confidence Band Estimation for Survival Random Forests

arXiv.org Machine LearningApr-25-2022

Survival random forest is a popular machine learning tool for modeling censored survival data. However, there is currently no statistically valid and computationally feasible approach for estimating its confidence band. This paper proposes an unbiased confidence band estimation by extending recent developments in infinite-order incomplete U-statistics. The idea is to estimate the variance-covariance matrix of the cumulative hazard function prediction on a grid of time points. We then generate the confidence band by viewing the cumulative hazard function estimation as a Gaussian process whose distribution can be approximated through simulation. This approach is computationally easy to implement when the subsampling size of a tree is no larger than half of the total training sample size. Numerical studies show that our proposed method accurately estimates the confidence band and achieves desired coverage rate. We apply this method to veterans' administration lung cancer data.

artificial intelligence, machine learning, variance, (18 more...)

2204.12038

Country:

North America > United States > Illinois (0.04)
North America > United States > North Carolina (0.04)
North America > United States > New York (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Oncology > Lung Cancer (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.74)