Goto

Collaborating Authors

 Ensemble Learning


An Improved Heart Disease Prediction Using Stacked Ensemble Method

arXiv.org Artificial Intelligence

Heart disorder has just overtaken cancer as the world's biggest cause of mortality. Several cardiac failures, heart disease mortality, and diagnostic costs can all be reduced with early identification and treatment. Medical data is collected in large quantities by the healthcare industry, but it is not well mined. The discovery of previously unknown patterns and connections in this information can help with an improved decision when it comes to forecasting heart disorder risk. In the proposed study, we constructed an ML-based diagnostic system for heart illness forecasting, using a heart disorder dataset. We used data preprocessing techniques like outlier detection and removal, checking and removing missing entries, feature normalization, cross-validation, nine classification algorithms like RF, MLP, KNN, ETC, XGB, SVC, ADB, DT, and GBM, and eight classifier measuring performance metrics like ramification accuracy, precision, F1 score, specificity, ROC, sensitivity, log-loss, and Matthews' correlation coefficient, as well as eight classification performance evaluations. Our method can easily differentiate between people who have cardiac disease and those are normal. Receiver optimistic curves and also the region under the curves were determined by every classifier. Most of the classifiers, pretreatment strategies, validation methods, and performance assessment metrics for classification models have been discussed in this study. The performance of the proposed scheme has been confirmed, utilizing all of its capabilities. In this work, the impact of clinical decision support systems was evaluated using a stacked ensemble approach that included these nine algorithms


TRBoost: A Generic Gradient Boosting Machine based on Trust-region Method

arXiv.org Artificial Intelligence

Gradient Boosting Machines (GBMs) have demonstrated remarkable success in solving diverse problems by utilizing Taylor expansions in functional space. However, achieving a balance between performance and generality has posed a challenge for GBMs. In particular, gradient descent-based GBMs employ the first-order Taylor expansion to ensure applicability to all loss functions, while Newton's method-based GBMs use positive Hessian information to achieve superior performance at the expense of generality. To address this issue, this study proposes a new generic Gradient Boosting Machine called Trust-region Boosting (TRBoost). In each iteration, TRBoost uses a constrained quadratic model to approximate the objective and applies the Trust-region algorithm to solve it and obtain a new learner. Unlike Newton's method-based GBMs, TRBoost does not require the Hessian to be positive definite, thereby allowing it to be applied to arbitrary loss functions while still maintaining competitive performance similar to second-order algorithms. The convergence analysis and numerical experiments conducted in this study confirm that TRBoost is as general as first-order GBMs and yields competitive results compared to second-order GBMs. Overall, TRBoost is a promising approach that balances performance and generality, making it a valuable addition to the toolkit of machine learning practitioners.


PD-ADSV: An Automated Diagnosing System Using Voice Signals and Hard Voting Ensemble Method for Parkinson's Disease

arXiv.org Artificial Intelligence

In most cases, Parkinson's disease can be diagnosed based on the patient's motor symptoms [3] or through alternative neuroimaging methods such as PET scans and MRI [4]; However, in addition to being costly, time-consuming, and inaccessible to the general public, these procedures are not remarkably accurate when diagnosing patients. Recent studies indicate that nearly 90 percent of PD patients suffer from vocal disorders as one of its first symptoms [5]. Voice and speech issues are characterized by decreased absolute speech volume and pitch variation, breathiness, tremor, hoarse voice quality (roughness), variable speech rates, and imprecise articulation [6]. Therefore, analyzing the voice signals of Parkinson's patients is a vital step in the early diagnosis of this disorder.


XGBoost in R: A Step-by-Step Example

#artificialintelligence

Boosting is a technique in machine learning that has been shown to produce models with high predictive accuracy. One of the most common ways to implement boosting in practice is to use XGBoost, short for "extreme gradient boosting." This tutorial provides a step-by-step example of how to use XGBoost to fit a boosted model in R. For this example we'll fit a boosted regression model to the Boston dataset from the MASS package. This dataset contains 13 predictor variables that we'll use to predict one response variable called mdev, which represents the median value of homes in different census tracts around Boston. We can see that the dataset contains 506 observations and 14 total variables.


Data-driven multinomial random forest

arXiv.org Artificial Intelligence

In this article, we strengthen the proof methods of some previously weakly consistent variants of random forests into strongly consistent proof methods, and improve the data utilization of these variants, in order to obtain better theoretical properties and experimental performance. In addition, based on the multinomial random forest (MRF) and Bernoulli random forest (BRF), we propose a data-driven multinomial random forest (DMRF) algorithm, which has lower complexity than MRF and higher complexity than BRF while satisfying strong consistency. It has better performance in classification and regression problems than previous RF variants that only satisfy weak consistency, and in most cases even surpasses standard random forest. To the best of our knowledge, DMRF is currently the most excellent strongly consistent RF variant with low algorithm complexity.


Pump It Up: Predict Water Pump Status using Attentive Tabular Learning

arXiv.org Artificial Intelligence

Water crisis is a crucial concern around the globe. Appropriate and timely maintenance of water pumps in drought-hit countries is vital for communities relying on the well. In this paper, we analyze and apply a sequential attentive deep neural architecture, TabNet, for predicting water pump repair status in Tanzania. The model combines the valuable benefits of tree-based algorithms and neural networks, enabling end-to-end training, model interpretability, sparse feature selection, and efficient learning on tabular data. Finally, we compare the performance of TabNet with popular gradient tree-boosting algorithms like XGBoost, LightGBM,CatBoost, and demonstrate how we can further uplift the performance by choosing focal loss as the objective function while training on imbalanced data.


Training Methods for Adaptive Boosting of Neural Networks

Neural Information Processing Systems

"Boosting" is a general method for improving the performance of any learning algorithm that consistently generates classifiers which need to perform only slightly better than random guessing. A recently proposed and very promising boosting algorithm is AdaBoost [5]. It has been ap(cid:173) plied with great success to several benchmark machine learning problems using rather simple learning algorithms [4], and decision trees [1, 2, 6]. In this paper we use AdaBoost to improve the performances of neural networks. We compare training methods based on sampling the training set and weighting the cost function. Our system achieves about 1.4% error on a data base of online handwritten digits from more than 200 writers.


A Gradient-Based Boosting Algorithm for Regression Problems

Neural Information Processing Systems

Adaptive boosting methods are simple modular algorithms that operate as follows. Let 9: X -t Y be the function to be learned, where the label set Y is finite, typ(cid:173) ically binary-valued. The algorithm uses a learning procedure, which has access to n training examples, {(Xl, Y1), ..., (xn, Yn)}, drawn randomly from X x Yac(cid:173) cording to distribution D; it outputs a hypothesis I: X -t Y, whose error is the expected value of a loss function on I(x), g(x), where X is chosen according to D. Given f, cl 0 and access to random examples, a strong learning procedure outputs with probability 1 - cl a hypothesis with error at most f, with running time polyno(cid:173) mial in 1/ f, 1/ cl and the number of examples. A weak learning procedure satisfies the same conditions, but where f need only be better than random guessing. Schapire (1990) showed that any weak learning procedure, denoted WeakLeam, can be efficiently transformed ("boosted") into a strong learning procedure. The AdaBoost algorithm achieves this by calling WeakLeam multiple times, in a se(cid:173) quence of T stages, each time presenting it with a different distribution over a fixed training set and finally combining all of the hypotheses. The algorithm maintains a weight w: for each training example i at stage i, and a distribution D t is computed by normalizing these weights.


McRank: Learning to Rank Using Multiple Classification and Gradient Boosting

Neural Information Processing Systems

We cast the ranking problem as (1) multiple classification ("Mc") (2) multiple or- dinal classification, which lead to computationally tractable learning algorithms for relevance ranking in Web search. We consider the DCG criterion (discounted cumulative gain), a standard quality measure in information retrieval. Our ap- proach is motivated by the fact that perfect classifications result in perfect DCG scores and the DCG errors are bounded by classification errors. We propose us- ing the Expected Relevance to convert class probabilities into ranking scores. The class probabilities are learned using a gradient boosting tree algorithm.


AI Predicts Antidepressant Treatment Outcomes

#artificialintelligence

A new multi-institution research study shows how artificial intelligence (AI) machine learning combined with electronic health records (EHRs) can predict antidepressant treatment outcomes. "These investigations have the potential to drive the development of a clinical decision‐making tool for personalized management of depression," wrote researchers affiliated with Weill Cornell Medicine, Temple University, the University of Washington, Mayo Clinic, Northwestern University, and the University of Florida who conducted the study, which was funded in part by the U.S. National Institutes of Health. An estimated 280 million people worldwide, or 3.8 percent of the global population, experience depression, according to the World Health Organization. Fortunately, there are effective treatments for depression. Symptoms of depression may include persistent feelings of sadness, the loss of interest or pleasure in things and activities once enjoyed, feelings of guilt or worthlessness, thoughts of suicide or death, slowed movements or speech, difficulty thinking or making decisions, concentration challenges, changes in appetite, too much or too little sleep, loss of energy or increased fatigue, and loss of energy according to the American Psychiatric Association.