AITopics

2009.06332

Country:

North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Shaanxi Province > Xi'an (0.04)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.52)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.33)

Coulombe, Philippe Goulet

To Bag is to Prune

arXiv.org Machine LearningSep-14-2020

It is notoriously hard to build a bad Random Forest (RF). Concurrently, RF is perhaps the only standard ML algorithm that blatantly overfits in-sample without any consequence out-of-sample. Standard arguments cannot rationalize this paradox. I propose a new explanation: bootstrap aggregation and model perturbation as implemented by RF automatically prune a (latent) true underlying tree. More generally, there is no need to tune the stopping point of a properly randomized ensemble of greedily optimized base learners. Thus, Boosting and MARS are eligible for automatic (implicit) tuning. I empirically demonstrate the property, with simulated and real data, by reporting that these new completely overfitting ensembles yield an out-of-sample performance equivalent to that of their tuned counterparts -- or better.

algorithm, artificial intelligence, machine learning, (19 more...)

2008.07063

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > California (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (1.00)

Industry: Banking & Finance > Economy (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.67)

Athanasiou, Maria, Sfrintzeri, Konstantina, Zarkogianni, Konstantia, Thanopoulou, Anastasia C., Nikita, Konstantina S.

An explainable XGBoost-based approach towards assessing the risk of cardiovascular disease in patients with Type 2 Diabetes Mellitus

arXiv.org Artificial IntelligenceSep-14-2020

Cardiovascular Disease (CVD) is an important cause of disability and death among individuals with Diabetes Mellitus (DM). International clinical guidelines for the management of Type 2 DM (T2DM) are founded on primary and secondary prevention and favor the evaluation of CVD related risk factors towards appropriate treatment initiation. CVD risk prediction models can provide valuable tools for optimizing the frequency of medical visits and performing timely preventive and therapeutic interventions against CVD events. The integration of explainability modalities in these models can enhance human understanding on the reasoning process, maximize transparency and embellish trust towards the models' adoption in clinical practice. The aim of the present study is to develop and evaluate an explainable personalized risk prediction model for the fatal or non-fatal CVD incidence in T2DM individuals. An explainable approach based on the eXtreme Gradient Boosting (XGBoost) and the Tree SHAP (SHapley Additive exPlanations) method is deployed for the calculation of the 5-year CVD risk and the generation of individual explanations on the model's decisions. Data from the 5-year follow up of 560 patients with T2DM are used for development and evaluation purposes. The obtained results (AUC = 71.13%) indicate the potential of the proposed approach to handle the unbalanced nature of the used dataset, while providing clinically meaningful insights about the ensemble model's decision process.

artificial intelligence, machine learning, risk factor, (17 more...)

arXiv.org Artificial Intelligence

2009.06629

Country: Europe > Greece (0.16)

Genre: Research Report > Experimental Study (0.34)

Industry:

Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

#artificialintelligenceSep-13-2020, 19:15:04 GMT

Artificial Intelligence Helps Cut Down on MRI No-shows

Weekly outpatient MRI appointment no-show rates for 1 year before (19.3%) and 6 months after (15.9%) implementation of intervention measures in March 2019, as guided by XGBoost prediction model. September 10, 2020 -- According to ARRS' American Journal of Roentgenology (AJR), artificial intelligence (AI) predictive analytics performed moderately well in solving complex multifactorial operational problems -- outpatient MRI appointment no-shows, especially -- using a modest amount of data and basic feature engineering. "Such data may be readily retrievable from frontline information technology systems commonly used in most hospital radiology departments, and they can be readily incorporated into routine workflow practice to improve the efficiency and quality of health care delivery," wrote lead author Le Roy Chong of Singapore's Changi General Hospital. To train and validate their model, Chong and colleagues extracted records of 32,957 outpatient MRI appointments scheduled between January 2016 and December 2018 from their institution's radiology information system, while acquiring a further holdout test set of 1,080 records from January 2019. Overall, the no-show rate was 17.4%.

artificial intelligence, machine learning, outpatient mri appointment, (12 more...)

Country:

Asia > Singapore (0.26)
North America > United States > Texas (0.06)

Genre: Research Report > New Finding (0.38)

Industry:

Health & Medicine > Health Care Providers & Services (0.80)
Health & Medicine > Diagnostic Medicine > Imaging (0.62)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.40)

Wang, Weiwei, Eberhardt, Wiebke, Bromuri, Stefano

That looks interesting! Personalizing Communication and Segmentation with Random Forest Node Embeddings

arXiv.org Artificial IntelligenceSep-13-2020

Communicating effectively with customers is a challenge for many marketers, but especially in a context that is both pivotal to individual long-term financial well-being and difficult to understand: pensions. Around the world, participants are reluctant to consider their pension in advance, it leads to a lack of preparation of their pension retirement [1], [2]. In order to engage participants to obtain information on their expected pension benefits, personalizing the pension providers' email communication is a first and crucial step. We describe a machine learning approach to model email newsletters to fit participants' interests. The data for the modeling and analysis is collected from newsletters sent by a large Dutch pension provider of the Netherlands and is divided into two parts. The first part comprises 2,228,000 customers whereas the second part comprises the data of a pilot study, which took place in July 2018 with 465,711 participants. In both cases, our algorithm extracts features from continuous and categorical data using random forests, and then calculates node embeddings of the decision boundaries of the random forest. We illustrate the algorithm's effectiveness for the classification task, and how it can be used to perform data mining tasks. In order to confirm that the result is valid for more than one data set, we also illustrate the properties of our algorithm in benchmark data sets concerning churning. In the data sets considered, the proposed modeling demonstrates competitive performance with respect to other state of the art approaches based on random forests, achieving the best Area Under the Curve (AUC) in the pension data set (0.948). For the descriptive part, the algorithm can identify customer segmentations that can be used by marketing departments to better target their communication towards their customers.

artificial intelligence, machine learning, participant, (18 more...)

arXiv.org Artificial Intelligence

2009.05931

Country:

North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > Netherlands > Limburg > Maastricht (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Banking & Finance (1.00)
Consumer Products & Services > Retirement (0.94)
Information Technology (0.93)
Telecommunications (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
(2 more...)

Krabel, Tobias Markus, Tran, Thi Ngoc Tien, Groll, Andreas, Horn, Daniel, Jentsch, Carsten

Random boosting and random^2 forests -- A random tree depth injection approach

arXiv.org Machine LearningSep-13-2020

The induction of additional randomness in parallel and sequential ensemble methods has proven to be worthwhile in many aspects. In this manuscript, we propose and examine a novel random tree depth injection approach suitable for sequential and parallel tree-based approaches including Boosting and Random Forests. The resulting methods are called \emph{Random Boost} and \emph{Random$^2$ Forest}. Both approaches serve as valuable extensions to the existing literature on the gradient boosting framework and random forests. A Monte Carlo simulation, in which tree-shaped data sets with different numbers of final partitions are built, suggests that there are several scenarios where \emph{Random Boost} and \emph{Random$^2$ Forest} can improve the prediction performance of conventional hierarchical boosting and random forest approaches. The new algorithms appear to be especially successful in cases where there are merely a few high-order interactions in the generated data. In addition, our simulations suggest that our random tree depth injection approach can improve computation time by up to 40%, while at the same time the performance losses in terms of prediction accuracy turn out to be minor or even negligible in most cases.

algorithm, artificial intelligence, machine learning, (19 more...)

2009.06078

Country:

Europe > Austria > Vienna (0.14)
Europe > Germany > North Rhine-Westphalia > Arnsberg Region > Dortmund (0.04)
North America > United States > New York (0.04)
Europe > Germany > Hesse > Darmstadt Region > Frankfurt (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

#artificialintelligenceSep-11-2020, 14:20:51 GMT

Using Machine Learning to Predict Car Accidents

Road accidents constitute a significant proportion of the number of serious injuries reported every year. Yet, it is often challenging to determine which specific conditions lead to such events, making it more difficult for local law enforcement to address the number and severity of road accidents. We all know that some characteristics of vehicles and the surroundings play a key role (engine capacity, condition of the road, etc.). However, many questions are still open. Which of these factors are the leading ones?

accident, artificial intelligence, machine learning, (15 more...)

Country:

Europe > United Kingdom > Wales (0.05)
Europe > United Kingdom > England (0.05)

Industry: Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.30)

#artificialintelligenceSep-11-2020, 14:20:32 GMT

Artificial intelligence helps cut down on MRI no-shows

According to ARRS' American Journal of Roentgenology (AJR), artificial intelligence (AI) predictive analytics performed moderately well in solving complex multifactorial operational problems--outpatient MRI appointment no-shows, especially--using a modest amount of data and basic feature engineering. "Such data may be readily retrievable from frontline information technology systems commonly used in most hospital radiology departments, and they can be readily incorporated into routine workflow practice to improve the efficiency and quality of health care delivery," wrote lead author Le Roy Chong of Singapore's Changi General Hospital. To train and validate their model, Chong and colleagues extracted records of 32,957 outpatient MRI appointments scheduled between January 2016 and December 2018 from their institution's radiology information system, while acquiring a further holdout test set of 1,080 records from January 2019. Overall, the no-show rate was 17.4%. After evaluating various machine learning predictive models developed with widely used open-source software tools, Chong and team deployed a decision tree-based ensemble algorithm that uses a gradient boosting framework: XGBoost, version 0.80 [Tianqi Chen].

artificial intelligence, machine learning, no-show rate, (5 more...)

Country: Asia > Singapore (0.27)

Genre: Research Report > New Finding (0.41)

Industry: Health & Medicine > Health Care Providers & Services (0.81)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.59)

Brophy, Jonathan, Lowd, Daniel

DART: Data Addition and Removal Trees

arXiv.org Machine LearningSep-11-2020

How can we update data for a machine learning model after it has already trained on that data? In this paper, we introduce DART, a variant of random forests that supports adding and removing training data with minimal retraining. Data updates in DART are exact, meaning that adding or removing examples from a DART model yields exactly the same model as retraining from scratch on updated data. DART uses two techniques to make updates efficient. The first is to cache data statistics at each node and training data at each leaf, so that only the necessary subtrees are retrained. The second is to choose the split variable randomly at the upper levels of each tree, so that the choice is completely independent of the data and never needs to change. At the lower levels, split variables are chosen to greedily maximize a split criterion such as Gini index or mutual information. By adjusting the number of random-split levels, DART can trade off between more accurate predictions and more efficient updates. In experiments on ten real-world datasets and one synthetic dataset, we find that DART is orders of magnitude faster than retraining from scratch while sacrificing very little in terms of predictive performance.

artificial intelligence, machine learning, node, (15 more...)

2009.05567

Country:

North America > United States > California (0.14)
North America > United States > Oregon (0.04)
North America > Canada > Ontario > National Capital Region > Ottawa (0.04)
Asia > India (0.04)

Genre: Research Report > New Finding (0.67)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.95)
Transportation (0.94)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

#artificialintelligenceSep-8-2020, 16:55:32 GMT

Mitigating Bias in Machine Learning: An introduction to MLFairnessPipeline

Bias takes many different forms and impact all groups of people. It can range from implicit to explicit and is often very difficult to detect. In the field of machine learning bias is often subtle and hard to identify, let alone solve. Why is this a problem? Implicit bias in machine learning has very real consequences including denial of a loan, a lengthier prison sentence, and many other harmful outcomes for underprivileged groups.

artificial intelligence, machine learning, underprivileged group, (13 more...)

Genre: Research Report (0.33)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.33)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.31)