AITopics

1907.08127

Country: North America (0.46)

Genre: Research Report > Experimental Study (0.56)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

#artificialintelligenceJul-17-2019, 06:41:46 GMT

XGBoost and Random Forest with Bayesian Optimisation

Instead of only comparing XGBoost and Random Forest in this post we will try to explain how to use those two very popular approaches with Bayesian Optimisation and that are those models main pros and cons. XGBoost (XGB) and Random Forest (RF) both are ensemble learning methods and predict (classification or regression) by combining the outputs from individual decision trees (we assume tree-based XGB or RF). XGBoost build decision tree one each time. Each new tree corrects errors which were made by previously trained decision tree. At Addepto we use XGBoost models to solve anomaly detection problems e.g. in supervised learning approach.

artificial intelligence, decision tree learning, machine learning, (15 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

#artificialintelligenceJul-17-2019, 06:40:42 GMT

Top Machine Learning and Data Science Methods Used at Work

The practice of data science requires the use algorithms and data science methods to help data professionals extract insights and value from data. A recent survey by Kaggle revealed that data professionals used data visualization, logistic regression, cross-validation and decision trees more than other data science methods in 2017. Looking ahead to 2018, data professionals are most interested in learning deep learning (41%). Kaggle conducted a survey in August 2017 of over 16,000 data professionals (2017 State of Data Science and Machine Learning). Their survey included a variety of questions about data science, machine learning, education and more.

data professional, data science method, science method, (13 more...)

Genre: Research Report > New Finding (0.42)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.78)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Lucic, Ana, Haned, Hinda, de Rijke, Maarten

Contrastive Explanations for Large Errors in Retail Forecasting Predictions through Monte Carlo Simulations

arXiv.org Artificial IntelligenceJul-17-2019

At Ahold Delhaize, there is an interest in using more complex machine learning techniques for sales forecasting. It is difficult to convince analysts, along with their superiors, to adopt these techniques since the models are considered to be 'black boxes,' even if they perform better than current models in use. We aim to explore the impact of contrastive explanations about large errors on users' attitudes towards a 'black-box' model. In this work, we make two contributions. The first is an algorithm, Monte Carlo Bounds for Reasonable Predictions (MC-BRP). Given a large error, MC-BRP determines (1) feature values that would result in a reasonable prediction, and (2) general trends between each feature and the target, based on Monte Carlo simulations. The second contribution is the evaluation of MC-BRP along with its outcomes, which has both objective and subjective components. We evaluate on a real dataset with real users from Ahold Delhaize by conducting a user study to determine if explanations generated by MC-BRP help users understand why a prediction results in a large error, and if this promotes trust in an automatically-learned model. The study shows that users are able to answer objective questions about the model's predictions with overall 81.7% accuracy when provided with these contrastive explanations. We also show that users who saw MC-BRP explanations understand why the model makes large errors in predictions significantly more than users in the control group.

artificial intelligence, explanation, machine learning, (17 more...)

arXiv.org Artificial Intelligence

1908.00085

Country: Europe > Netherlands (0.14)

Genre: Research Report > Experimental Study (1.00)

Industry:

Information Technology > Security & Privacy (0.68)
Banking & Finance (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.68)

da Costa, Victor G. Turrisi, Mastelini, Saulo Martiello, de Carvalho, André C. Ponce de Leon Ferreira, Barbon, Sylvio Jr

Online Local Boosting: improving performance in online decision trees

arXiv.org Machine LearningJul-16-2019

As more data are produced each day, and faster, data stream mining is growing in importance, making clear the need for algorithms able to fast process these data. Data stream mining algorithms are meant to be solutions to extract knowledge online, specially tailored from continuous data problem. Many of the current algorithms for data stream mining have high processing and memory costs. Often, the higher the predictive performance, the higher these costs. To increase predictive performance without largely increasing memory and time costs, this paper introduces a novel algorithm, named Online Local Boosting (OLBoost), which can be combined into online decision tree algorithms to improve their predictive performance without modifying the structure of the induced decision trees. For such, OLBoost applies a boosting to small separate regions of the instances space. Experimental results presented in this paper show that by using OLBoost the online learning decision tree algorithms can significantly improve their predictive performance. Additionally, it can make smaller trees perform as good or better than larger trees.

algorithm, artificial intelligence, machine learning, (19 more...)

1907.07207

Country: South America > Brazil (0.47)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Khadiev, Kamil, Mannapov, Ilnaz, Safina, Liliya

The Quantum Version Of Classification Decision Tree Constructing Algorithm C5.0

arXiv.org Machine LearningJul-16-2019

In the paper, we focus on complexity of C5.0 algorithm for constructing decision tree classifier that is the models for the classification problem from machine learning. In classical case the decision tree is constructed in $O(hd(NM+N \log N))$ running time, where $M$ is a number of classes, $N$ is the size of a training data set, $d$ is a number of attributes of each element, $h$ is a tree height. Firstly, we improved the classical version, the running time of the new version is $O(h\cdot d\cdot N\log N)$. Secondly, we suggest a quantum version of this algorithm, which uses quantum subroutines like the amplitude amplification and the D{\"u}rr-H{\o}yer minimum search algorithms that are based on Grover's algorithm. The running time of the quantum algorithm is $O\big(h\cdot \sqrt{d}\log d \cdot N \log N\big)$ that is better than complexity of the classical algorithm.

algorithm, artificial intelligence, machine learning, (17 more...)

1907.0684

Country: Europe > Russia (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Zhang, Wenbin, Ntoutsi, Eirini

FAHT: An Adaptive Fairness-aware Decision Tree Classifier

arXiv.org Artificial IntelligenceJul-16-2019

Automated data-driven decision-making systems are ubiquitous across a wide spread of online as well as offline services. These systems, depend on sophisticated learning algorithms and available data, to optimize the service function for decision support assistance. However, there is a growing concern about the accountability and fairness of the employed models by the fact that often the available historic data is intrinsically discriminatory, i.e., the proportion of members sharing one or more sensitive attributes is higher than the proportion in the population as a whole when receiving positive classification, which leads to a lack of fairness in decision support system. A number of fairness-aware learning methods have been proposed to handle this concern. However, these methods tackle fairness as a static problem and do not take the evolution of the underlying stream population into consideration. In this paper, we introduce a learning mechanism to design a fair classifier for online stream based decision-making. Our learning model, FAHT (Fairness-Aware Hoeffding Tree), is an extension of the well-known Hoeffding Tree algorithm for decision tree induction over streams, that also accounts for fairness. Our experiments show that our algorithm is able to deal with discrimination in streaming environments, while maintaining a moderate predictive performance over the stream.

artificial intelligence, discrimination, machine learning, (16 more...)

arXiv.org Artificial Intelligence

1907.07237

Country: North America > United States > Maryland (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

#artificialintelligenceJul-15-2019, 08:15:23 GMT

What's wrong with the approach to Data Science?

Data science is the application of statistics, programming and domain knowledge to generate insights into a problem that needs to be solved. The Harvard Business Review said Data Scientist is the sexiest job of the 21st century. How often has that article been referenced to convince people? The job'Data Scientist' has been around for decades, it was just not called "Data Scientist". Statisticians have used their knowledge and skills using machine learning techniques such as Logistic Regression and Random Forest for prediction and insights for decades.

artificial intelligence, decision tree learning, machine learning, (10 more...)

Industry: Education > Educational Setting > Online (0.52)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.36)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.33)

arXiv.org Machine LearningJul-15-2019

Best Split Nodes for Regression Trees

Klusowski, Jason M.

Decision trees with binary splits are popularly constructed using Classification and Regression Trees (CART) methodology. For regression models, this approach recursively divides the data into two near-homogenous daughter nodes according to a split point that maximizes the reduction in sum of squares error (the impurity) along a particular variable. This paper aims to study the bias and adaptive properties of regression trees constructed with CART. In doing so, we derive an interesting connection between the bias and the mean decrease in impurity (MDI) measure of variable importance---a tool widely used for model interpretability---defined as the sum of impurity reductions over all non-terminal nodes in the tree. In particular, we show that the size of a terminal subnode for a variable is small when the MDI for that variable is large and that this relationship is exponential---confirming theoretically that decision trees with CART have small bias and are adaptive to signal strength and direction. Finally, we apply these individual tree bounds to tree ensembles and show consistency of Breiman's random forests. The context is surprisingly general and applies to a wide variety of multivariable data generating distributions and regression functions. The main technical tool is an exact characterization of the conditional probability content of the daughter nodes arising from an optimal split, in terms of the partial dependence function and reduction in impurity.

artificial intelligence, machine learning, node, (18 more...)

1906.10086

Country: North America > United States (0.45)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

#artificialintelligenceJul-13-2019, 16:25:34 GMT

Heart of Darkness: Logistic Regression vs. Random Forest

The'functional needs repair' category of the target variable only makes up about 7% of the whole set. The implication is that whatever algorithm you end up using it's probably going to learn the other two balanced classes a lot better than this one. Such is data science: the struggle is real. The first thing we're going to do is create an'age' variable for the waterpoints as that seems highly relevant. The'population' variable also has a highly right-skewed distribution so we're going to change that as well: The zeros inside of the'amount_tsh' are also probably NaNs so we're going to do something drastic and simplify it into 0s and 1s: One of the most important points we learned from the week before and something that will stay with me is the idea of coming up with a baseline model as fast as one can.

artificial intelligence, machine learning, random forest, (6 more...)

Genre:

Research Report > New Finding (0.44)
Research Report > Experimental Study (0.44)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.44)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.40)