AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Coulombe, Philippe Goulet

To Bag is to Prune

arXiv.org Machine LearningSep-14-2020

It is notoriously hard to build a bad Random Forest (RF). Concurrently, RF is perhaps the only standard ML algorithm that blatantly overfits in-sample without any consequence out-of-sample. Standard arguments cannot rationalize this paradox. I propose a new explanation: bootstrap aggregation and model perturbation as implemented by RF automatically prune a (latent) true underlying tree. More generally, there is no need to tune the stopping point of a properly randomized ensemble of greedily optimized base learners. Thus, Boosting and MARS are eligible for automatic (implicit) tuning. I empirically demonstrate the property, with simulated and real data, by reporting that these new completely overfitting ensembles yield an out-of-sample performance equivalent to that of their tuned counterparts -- or better.

algorithm, artificial intelligence, machine learning, (19 more...)

2008.07063

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > California (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (1.00)

Industry: Banking & Finance > Economy (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.67)

Neto, Mário Popolin, Paulovich, Fernando V.

Explainable Matrix -- Visualization for Global and Local Interpretability of Random Forest Classification Ensembles

arXiv.org Machine LearningSep-14-2020

Over the past decades, classification models have proven to be essential machine learning tools given their potential and applicability in various domains. In these years, the north of the majority of the researchers had been to improve quantitative metrics, notwithstanding the lack of information about models' decisions such metrics convey. This paradigm has recently shifted, and strategies beyond tables and numbers to assist in interpreting models' decisions are increasing in importance. Part of this trend, visualization techniques have been extensively used to support classification models' interpretability, with a significant focus on rule-based models. Despite the advances, the existing approaches present limitations in terms of visual scalability, and the visualization of large and complex models, such as the ones produced by the Random Forest (RF) technique, remains a challenge. In this paper, we propose Explainable Matrix (ExMatrix), a novel visualization method for RF interpretability that can handle models with massive quantities of rules. It employs a simple yet powerful matrix-like visual metaphor, where rows are rules, columns are features, and cells are rules predicates, enabling the analysis of entire models and auditing classification results. ExMatrix applicability is confirmed via different examples, showing how it can be used in practice to promote RF models interpretability.

artificial intelligence, machine learning, visualization, (17 more...)

2005.04289

Country:

North America > United States > California > Alameda County > Berkeley (0.14)
North America > United States > New York > New York County > New York City (0.05)
South America > Brazil > São Paulo (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area (0.46)
Information Technology > Security & Privacy (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

#artificialintelligenceSep-13-2020, 07:21:31 GMT

10 Machine Learning Algorithms You Need to Know

If you've just started to explore the ways that machine learning can impact your business, the first questions you're likely to come across are what are all of the different types of machine learning algorithms, what are they good for, and which one should I choose for my project? This post will help you answer those questions. There are a few different ways to categorize machine learning algorithms. One way is based on what the training data looks like. Another way to classify algorithms--and one that's more practical from a business perspective--is to categorize them based on how they work and what kinds of problems they can solve, which is what we'll do here.

algorithm, artificial intelligence, machine learning, (15 more...)

Country: North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.05)

Industry: Banking & Finance (0.96)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.51)

Wang, Weiwei, Eberhardt, Wiebke, Bromuri, Stefano

That looks interesting! Personalizing Communication and Segmentation with Random Forest Node Embeddings

arXiv.org Artificial IntelligenceSep-13-2020

Communicating effectively with customers is a challenge for many marketers, but especially in a context that is both pivotal to individual long-term financial well-being and difficult to understand: pensions. Around the world, participants are reluctant to consider their pension in advance, it leads to a lack of preparation of their pension retirement [1], [2]. In order to engage participants to obtain information on their expected pension benefits, personalizing the pension providers' email communication is a first and crucial step. We describe a machine learning approach to model email newsletters to fit participants' interests. The data for the modeling and analysis is collected from newsletters sent by a large Dutch pension provider of the Netherlands and is divided into two parts. The first part comprises 2,228,000 customers whereas the second part comprises the data of a pilot study, which took place in July 2018 with 465,711 participants. In both cases, our algorithm extracts features from continuous and categorical data using random forests, and then calculates node embeddings of the decision boundaries of the random forest. We illustrate the algorithm's effectiveness for the classification task, and how it can be used to perform data mining tasks. In order to confirm that the result is valid for more than one data set, we also illustrate the properties of our algorithm in benchmark data sets concerning churning. In the data sets considered, the proposed modeling demonstrates competitive performance with respect to other state of the art approaches based on random forests, achieving the best Area Under the Curve (AUC) in the pension data set (0.948). For the descriptive part, the algorithm can identify customer segmentations that can be used by marketing departments to better target their communication towards their customers.

artificial intelligence, machine learning, participant, (18 more...)

arXiv.org Artificial Intelligence

2009.05931

Country:

North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > Netherlands > Limburg > Maastricht (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Banking & Finance (1.00)
Consumer Products & Services > Retirement (0.94)
Information Technology (0.93)
Telecommunications (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
(2 more...)

Krabel, Tobias Markus, Tran, Thi Ngoc Tien, Groll, Andreas, Horn, Daniel, Jentsch, Carsten

Random boosting and random^2 forests -- A random tree depth injection approach

arXiv.org Machine LearningSep-13-2020

The induction of additional randomness in parallel and sequential ensemble methods has proven to be worthwhile in many aspects. In this manuscript, we propose and examine a novel random tree depth injection approach suitable for sequential and parallel tree-based approaches including Boosting and Random Forests. The resulting methods are called \emph{Random Boost} and \emph{Random$^2$ Forest}. Both approaches serve as valuable extensions to the existing literature on the gradient boosting framework and random forests. A Monte Carlo simulation, in which tree-shaped data sets with different numbers of final partitions are built, suggests that there are several scenarios where \emph{Random Boost} and \emph{Random$^2$ Forest} can improve the prediction performance of conventional hierarchical boosting and random forest approaches. The new algorithms appear to be especially successful in cases where there are merely a few high-order interactions in the generated data. In addition, our simulations suggest that our random tree depth injection approach can improve computation time by up to 40%, while at the same time the performance losses in terms of prediction accuracy turn out to be minor or even negligible in most cases.

algorithm, artificial intelligence, machine learning, (19 more...)

2009.06078

Country:

Europe > Austria > Vienna (0.14)
Europe > Germany > North Rhine-Westphalia > Arnsberg Region > Dortmund (0.04)
North America > United States > New York (0.04)
Europe > Germany > Hesse > Darmstadt Region > Frankfurt (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Brophy, Jonathan, Lowd, Daniel

DART: Data Addition and Removal Trees

arXiv.org Machine LearningSep-11-2020

How can we update data for a machine learning model after it has already trained on that data? In this paper, we introduce DART, a variant of random forests that supports adding and removing training data with minimal retraining. Data updates in DART are exact, meaning that adding or removing examples from a DART model yields exactly the same model as retraining from scratch on updated data. DART uses two techniques to make updates efficient. The first is to cache data statistics at each node and training data at each leaf, so that only the necessary subtrees are retrained. The second is to choose the split variable randomly at the upper levels of each tree, so that the choice is completely independent of the data and never needs to change. At the lower levels, split variables are chosen to greedily maximize a split criterion such as Gini index or mutual information. By adjusting the number of random-split levels, DART can trade off between more accurate predictions and more efficient updates. In experiments on ten real-world datasets and one synthetic dataset, we find that DART is orders of magnitude faster than retraining from scratch while sacrificing very little in terms of predictive performance.

artificial intelligence, machine learning, node, (15 more...)

2009.05567

Country:

North America > United States > California (0.14)
North America > United States > Oregon (0.04)
North America > Canada > Ontario > National Capital Region > Ottawa (0.04)
Asia > India (0.04)

Genre: Research Report > New Finding (0.67)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.95)
Transportation (0.94)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

#artificialintelligenceSep-9-2020, 18:47:52 GMT

A/B Testing with Machine Learning - A Step-by-Step Tutorial

With the rise of digital marketing led by tools including Google Analytics, Google Adwords, and Facebook Ads, a key competitive advantage for businesses is using A/B testing to determine effects of digital marketing efforts. In short, small changes can have big effects. This is why A/B testing is a huge benefit. A/B Testing enables us to determine whether changes in landing pages, popup forms, article titles, and other digital marketing decisions improve conversion rates and ultimately customer purchasing behavior. A successful A/B Testing strategy can lead to massive gains - more satisfied users, more engagement, and more sales - Win-Win-Win. A major issue with traditional, statistical-inference approaches to A/B Testing is that it only compares 2 variables - an experiment/control to an outcome. The problem is that customer behavior is vastly more complex than this. Customers take different paths, spend different amounts of time on the site, come from different backgrounds (age, gender, interests), and more. This is where Machine Learning excels - generating insights from complex systems.

artificial intelligence, decision tree learning, machine learning, (14 more...)

Genre: Research Report > Experimental Study (1.00)

Industry: Marketing (0.95)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.52)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.32)

#artificialintelligenceSep-8-2020, 14:15:49 GMT

Machine Learning Classification Bootcamp in Python

Free Coupon Discount - Build 10 Practical Projects and Advance Your Skills in Machine Learning Using Python and Scikit Learn Created by Dr. Ryan Ahmed, Ph.D., MBA, Kirill Eremenko, Hadelin de Ponteves, Mitchell Bouchard, SuperDataScience Team Students also bought Machine Learning A-Z: Hands-On Python & R In Data Science Python for Data Science and Machine Learning Bootcamp Machine Learning, Data Science and Deep Learning with Python Machine Learning with Javascript A Beginner's Guide To Machine Learning with Unity Preview this Udemy Course GET COUPON CODE Description Are you ready to master Machine Learning techniques and Kick-off your career as a Data Scientist?! You came to the right place! Machine Learning skill is one of the top skills to acquire in 2019 with an average salary of over $114,000 in the United States according to PayScale! The total number of ML jobs over the past two years has grown around 600 percent and expected to grow even more by 2020. This course provides students with knowledge, hands-on experience of state-of-the-art machine learning classification techniques such as Logistic Regression Decision Trees Random Forest Naïve Bayes Support Vector Machines (SVM) In this course, we are going to provide students with knowledge of key aspects of state-of-the-art classification techniques.

artificial intelligence, decision tree learning, machine learning, (12 more...)

Country: North America > United States (0.26)

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Industry:

Education > Educational Setting > Online (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.57)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.57)

Duarte, Aline, Hernández, Noslen

SeqROCTM: A Matlab toolbox for the analysis of Sequence of Random Objects driven by Context Tree Models

arXiv.org Artificial IntelligenceSep-8-2020

In several research problems we face probabilistic sequences of inputs (e.g., sequence of stimuli) from which an agent generates a corresponding sequence of responses and it is of interest to model/discover some kind of relation between them. To model such relation in the context of statistical learning in neuroscience, a new class of stochastic process have been introduced [5], namely sequences of random objects driven by context tree models. In this paper we introduce a freely available Matlab toolbox (SeqROCTM) that implements three model selection methods to make inference about the parameters of this kind of stochastic process.

artificial intelligence, machine learning, programming language, (15 more...)

arXiv.org Artificial Intelligence

2009.06371

Country: South America > Brazil > São Paulo (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.49)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.66)
(2 more...)