AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.64)

#artificialintelligenceMay-25-2022, 01:26:26 GMT

Know About Ensemble Methods in Machine Learning - Analytics Vidhya

This article was published as a part of the Data Science Blogathon. The variance is the difference between the model and the ground truth value, whereas the error is the outcome of sensitivity to tiny perturbations in the training set. Excessive bias might cause an algorithm to miss unique relationships between the intended outputs and the features (underfitting). There is a high variance in the algorithm that models random noise in the training data (overfitting). The bias-variance tradeoff is a characteristic of a model that states to lower the bias in estimated parameters, the variance of the parameter estimated across samples has increased.

classifier, ensemble method, prediction, (14 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.32)

Folie, Brendan, Hutchinson, Maxwell

Multivariate Prediction Intervals for Random Forests

arXiv.org Machine LearningMay-19-2022

Accurate uncertainty estimates can significantly improve the performance of iterative design of experiments, as in Sequential and Reinforcement learning. For many such problems in engineering and the physical sciences, the design task depends on multiple correlated model outputs as objectives and/or constraints. To better solve these problems, we propose a recalibrated bootstrap method to generate multivariate prediction intervals for bagged models and show that it is well-calibrated. We apply the recalibrated bootstrap to a simulated sequential learning problem with multiple objectives and show that it leads to a marked decrease in the number of iterations required to find a satisfactory candidate. This indicates that the recalibrated bootstrap could be a valuable tool for practitioners using machine learning to optimize systems with multiple competing targets.

artificial intelligence, decision tree learning, multivariate prediction interval, (2 more...)

2205.0226

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.40)

#artificialintelligenceMay-11-2022, 04:55:05 GMT

Techinfoplace Softwares Pvt Ltd.

Problem Statement A target marketing campaign for a bank was undertaken to identify a segment of customers who are likely to respond to an insurance product. Here, the target variable is whether or not the customers bought insurance product and it depends on factors like Product usage in three months, demographics, transaction patterns as like deposit amount, checking account, a branch of the bank, Residential information (like urban, rural) and so on.

customer, insurance product, techinfoplace software pvt ltd

Industry: Banking & Finance (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.37)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.37)

arXiv.org Machine LearningMay-5-2022

Performance and Interpretability Comparisons of Supervised Machine Learning Algorithms: An Empirical Study

Liu, Alice J., Mukherjee, Arpita, Hu, Linwei, Chen, Jie, Nair, Vijayan N.

This paper compares the performances of three supervised machine learning algorithms in terms of predictive ability and model interpretation on structured or tabular data. The algorithms considered were scikit-learn implementations of extreme gradient boosting machines (XGB) and random forests (RFs), and feedforward neural networks (FFNNs) from TensorFlow. The paper is organized in a findings-based manner, with each section providing general conclusions supported by empirical results from simulation studies that cover a wide range of model complexity and correlation structures among predictors. We considered both continuous and binary responses of different sample sizes. Overall, XGB and FFNNs were competitive, with FFNNs showing better performance in smooth models and tree-based boosting algorithms performing better in non-smooth models. This conclusion held generally for predictive performance, identification of important variables, and determining correct input-output relationships as measured by partial dependence plots (PDPs). FFNNs generally had less over-fitting, as measured by the difference in performance between training and testing datasets. However, the difference with XGB was often small. RFs did not perform well in general, confirming the findings in the literature. All models exhibited different degrees of bias seen in PDPs, but the bias was especially problematic for RFs. The extent of the biases varied with correlation among predictors, response type, and data set sample size. In general, tree-based models tended to over-regularize the fitted model in the tails of predictor distributions. Finally, as to be expected, performances were better for continuous responses compared to binary data and with larger samples.

artificial intelligence, decision tree learning, machine learning, (19 more...)

2204.12868

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

#artificialintelligenceApr-29-2022, 20:45:26 GMT

Pruned Random Forests for Effective and Efficient Financial Data Analytics

It is evident that Machine Learning (ML) has touched all walks of our lives! From checking the weather forecast to applying for a loan or a credit card, ML is used in almost every aspect of our daily life. In this chapter, ML is explored in terms of algorithms and applications. Special consideration is given to ML applications in the financial data analytics domain including stock market analysis, fraud detection in financial transactions, credit risk analysis, loan defaulting rate analysis, and profit–loss analysis. The chapter establishes the significance of Random Forests as an effective machine learning method for a wide variety of financial applications.

efficient financial data analytic, pruned random forest, random forest, (6 more...)

Industry: Banking & Finance > Credit (0.63)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.84)

arXiv.org Machine LearningApr-29-2022

A study of tree-based methods and their combination

Zeng, Yinuo

With the increase of data volume and the continuous development in deep learning, although more and more traditional machine learning techniques are outperformed by artificial neural networks, tree-based methods are still popular. Random forest (Breiman, 2001) is commonly used as a benchmark to evaluate the performance of nonparametric models, while XGBoost (Chen and Guestrin, 2016) performs well in Kaggle competitions and often competes with artificial neural networks. Also, instead of relying on a specific method, people prefer to make decisions based on a combination of multiple models, which shows a better performance than a single one. Therefore, identifying the importance of each model by weights assignment is critical.

artificial intelligence, machine learning, penalty, (16 more...)

2204.13916

Country: North America > United States > California (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.92)

arXiv.org Machine LearningApr-26-2022

Identification of feasible pathway information for c-di-GMP binding proteins in cellulose production

Hassan, Syeda Sakira, Mangayil, Rahul, Aho, Tommi, Yli-Harja, Olli, Karp, Matti

In this paper, we utilize a machine learning approach to identify the significant pathways for c-di-GMP signaling proteins. The dataset involves gene counts from 12 pathways and 5 essential c-di-GMP binding domains for 1024 bacterial genomes. Two novel approaches, Least absolute shrinkage and selection operator (Lasso) and Random forests, have been applied for analyzing and modeling the dataset. Both approaches show that bacterial chemotaxis is the most essential pathway for c-di-GMP encoding domains. Though popular for feature selection, the strong regularization of Lasso method fails to associate any pathway to MshE domain. Results from the analysis may help to understand and emphasize the supporting pathways involved in bacterial cellulose production. These findings demonstrate the need for a chassis to restrict the behavior or functionality by deactivating the selective pathways in cellulose production.

artificial intelligence, machine learning, pathway, (15 more...)

2204.12526

Country: Europe > Finland > Pirkanmaa > Tampere (0.06)

Genre: Research Report > New Finding (0.50)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.43)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.42)

Formentini, Sarah Elizabeth, Liang, Wei, Zhu, Ruoqing

Confidence Band Estimation for Survival Random Forests

arXiv.org Machine LearningApr-25-2022

Survival random forest is a popular machine learning tool for modeling censored survival data. However, there is currently no statistically valid and computationally feasible approach for estimating its confidence band. This paper proposes an unbiased confidence band estimation by extending recent developments in infinite-order incomplete U-statistics. The idea is to estimate the variance-covariance matrix of the cumulative hazard function prediction on a grid of time points. We then generate the confidence band by viewing the cumulative hazard function estimation as a Gaussian process whose distribution can be approximated through simulation. This approach is computationally easy to implement when the subsampling size of a tree is no larger than half of the total training sample size. Numerical studies show that our proposed method accurately estimates the confidence band and achieves desired coverage rate. We apply this method to veterans' administration lung cancer data.

artificial intelligence, machine learning, variance, (18 more...)

2204.12038

Country:

North America > United States > Illinois (0.04)
North America > United States > North Carolina (0.04)
North America > United States > New York (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Oncology > Lung Cancer (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.74)

#artificialintelligenceApr-21-2022, 04:35:37 GMT

Bojan Tunguz, Ph.D. on LinkedIn: #MachineLerning #DeepLearning #DataScience

Recently I came across this incredible survey paper on the use of neural networks for tabular data. After going through it carefully, I can confidently say that it's thus far THE best paper on the subject. It goes into depth of all the main issues that have stymied the use of NNs in this domain. The paper is very thoughtful, systematic, and fairly thorough. Despite what the authors claim, though, it is not the first paper on the topic, but it goes well beyond many recent papers on the subjects. It also doesn't have as an exhaustive set of datasets that it uses as some of the other papers.

datascience, deeplearning, machinelerning, (5 more...)

Technology:

Information Technology > Communications > Social Media (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.38)