AITopics

1911.06177

Country:

South America > Paraguay > Asunción > Asunción (0.04)
North America > United States > California (0.04)
North America > United States > North Carolina (0.04)
(2 more...)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)

#artificialintelligenceNov-12-2019, 14:23:31 GMT

What is Data Science? - KDnuggets

Data Science is considered as one of the most modern and fascinating jobs of our time. It can be funny and can give you satisfaction, but is it really as it's described? At the beginning of their career, Data Scientists think that Data Science is a wonderful, magical world full of algorithms, Python functions that performs every possible spell with a line of code and statistical models able to detect the most useful correlations among data that could make you an invincible superhero in your company. You start dreaming about your CEO congratulating with you and shaking your hand, you begin to see decision trees and clusters everywhere and, of course, the most terrifying neural network architectures your mind can dream. But since the very first day of your first Data Science project, you start to realize what reality is.

algorithm, data science, scientist, (13 more...)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.36)

Cousot, Kévin, Mirzapour, Mehdi, Ragheb, Waleed

Prediction of Missing Semantic Relations in Lexical-Semantic Network using Random Forest Classifier

arXiv.org Artificial IntelligenceNov-12-2019

This study focuses on the prediction of missing six semantic relations (such as is_a and has_part) between two given nodes in RezoJDM a French lexical-semantic network. The output of this prediction is a set of pairs in which the first entries are semantic relations and the second entries are the probabilities of existence of such relations. Due to the statement of the problem we choose the random forest (RF) predictor classifier approach to tackle this problem. We take for granted the existing semantic relations, for training/test dataset, gathered and validated by crowdsourcing. We describe how all of the mentioned ideas can be followed after using the node2vec approach in the feature extraction phase. We show how this approach can lead to acceptable results.

dataset, relation, semantic relation, (9 more...)

arXiv.org Artificial Intelligence

1911.04759

Country:

Europe > France > Occitanie > Hérault > Montpellier (0.05)
Europe > France > Centre-Val de Loire > Loiret > Orleans (0.04)
Asia > Thailand > Chonburi > Chonburi (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.87)

#artificialintelligenceNov-11-2019, 03:18:18 GMT

Randomization as Regularization: A Degrees of Freedom Explanation for Random Forest Success

The sustained success random forests has led naturally to the desire to better understand the statistical and mathematical properties of the procedure. Lin and Jeon (2006) introduced the potential nearest neighbor framework and Biau and Devroye (2010) later established related consistency properties. In the last several years, a number of important statistical properties of random forests have also been established whenever base learners are constructed with subsamples rather than bootstrap samples. Scornet et al. (2015) provided the first consistency result for Breiman's original random forest algorithm whenever the true underlying regression function is assumed to be additive. Despite the impressive volume of research from the past two decades and the exciting recent progress in establishing their statistical properties, a satisfying explanation for the sustained empirical success of random forests has yet to be provided.

procedure, random forest, selection procedure, (11 more...)

AI-Alerts: 2019 > 2019-11 > AAAI AI-Alert for Nov 12, 2019 (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

arXiv.org Machine LearningNov-11-2019

Privacy-Preserving Gradient Boosting Decision Trees

Li, Qinbin, Wu, Zhaomin, Wen, Zeyi, He, Bingsheng

The Gradient Boosting Decision Tree (GBDT) is a popular machine learning model for various tasks in recent years. In this paper, we study how to improve model accuracy of GBDT while preserving the strong guarantee of differential privacy. \textit{Sensitivity} and \textit{privacy budget} are two key design aspects for the effectiveness of differential private models. Existing solutions for GBDT with differential privacy suffer from the significant accuracy loss due to too loose sensitivity bounds and ineffective privacy budget allocations (especially across different trees in the GBDT model). Loose sensitivity bounds lead to more noise to obtain a fixed privacy level. Ineffective privacy budget allocations worsen the accuracy loss especially when the number of trees is large. Therefore, we propose a new GBDT training algorithm that achieves tighter sensitivity bounds and more effective noise allocations. Specifically, by investigating the property of gradient and the contribution of each tree in GBDTs, we propose to adaptively control the gradients of training data for each iteration and leaf node clipping in order to tighten the sensitivity bounds. Furthermore, we design a novel boosting framework to allocate the privacy budget between trees so that the accuracy loss can be reduced. Our experiments show that our approach can achieve much better model accuracy than other baselines.

gradient, privacy budget, sensitivity, (15 more...)

1911.04209

Country:

Asia > Singapore (0.04)
Oceania > Australia > Western Australia (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Rapp, Michael, Mencía, Eneldo Loza, Fürnkranz, Johannes

Simplifying Random Forests: On the Trade-off between Interpretability and Accuracy

arXiv.org Machine LearningNov-11-2019

We analyze the trade-off between model complexity and accuracy for random forests by breaking the trees up into individual classification rules and selecting a subset of them. We show experimentally that already a few rules are sufficient to achieve an acceptable accuracy close to that of the original model. Moreover, our results indicate that in many cases, this can lead to simpler models that clearly outperform the original ones.

decision boundary, simplifying random forest, subset, (9 more...)

1911.04393

Country: Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.05)

Genre:

Research Report > New Finding (0.51)
Research Report > Promising Solution (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.83)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.69)

arXiv.org Machine LearningNov-11-2019

Practical Federated Gradient Boosting Decision Trees

Li, Qinbin, Wen, Zeyi, He, Bingsheng

Gradient Boosting Decision Trees (GBDTs) have become very successful in recent years, with many awards in machine learning and data mining competitions. There have been several recent studies on how to train GBDTs in the federated learning setting. In this paper, we focus on horizontal federated learning, where data samples with the same features are distributed among multiple parties. However, existing studies are not efficient or effective enough for practical use. They suffer either from the inefficiency due to the usage of costly data transformations such as secure sharing and homomorphic encryption, or from the low model accuracy due to differential privacy designs. In this paper, we study a practical federated environment with relaxed privacy constraints. In this environment, a dishonest party might obtain some information about the other parties' data, but it is still impossible for the dishonest party to derive the actual raw data of other parties. Specifically, each party boosts a number of trees by exploiting similarity information based on locality-sensitive hashing. We prove that our framework is secure without exposing the original record to other parties, while the computation overhead in the training process is kept low. Our experimental studies show that, compared with normal training with the local data of each owner, our approach can significantly improve the predictive accuracy, and achieve comparable accuracy to the original GBDT with the data from all parties.

gradient, hash value, simfl, (15 more...)

1911.04206

Country:

North America > United States > California > Los Angeles County > Long Beach (0.04)
Asia > Singapore (0.04)
Oceania > Australia > Western Australia (0.04)
(2 more...)

Genre: Research Report > New Finding (0.48)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.85)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.62)

#artificialintelligenceNov-10-2019, 00:38:01 GMT

Machine Learning- Decision Trees and Random Forest Classifiers

Let's start by understanding what decision trees are because they are the fundamental units of a random forest classifier. At a high level, decision trees can be viewed as a machine learning construct used to perform either classification or regression on some data in a hierarchical structure. In this article, I will only discuss the use of decision trees for classification. Decision trees use machine learning to identify key differentiating factors between the different classes of our data. By doing so, decision trees can take some input data and predict a class by running the data through a set of differentiating questions that it forms using machine learning.

decision tree, entropy, information gain, (10 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

#artificialintelligenceNov-9-2019, 02:07:23 GMT

A Comprehensive Guide to Random Forest in R

Classification is the method of predicting the class of a given input data point. Classification problems are common in machine learning and they fall under the Supervised learning method.

algorithm, comprehensive guide, random forest

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.69)

arXiv.org Machine LearningNov-9-2019

In Vitro Fertilization (IVF) Cumulative Pregnancy Rate Prediction from Basic Patient Characteristics

Zhang, Bo, Cui, Yuqi, Wang, Meng, Li, Jingjing, Jin, Lei, Wu, Dongrui

Tens of millions of women suffer from infertility worldwide each year. In vitro fertilization (IVF) is the best choice for many such patients. However, IVF is expensive, time-consuming, and both physically and emotionally demanding. The first question that a patient usually asks before the IVF is how likely she will conceive, given her basic medical examination information. This paper proposes three approaches to predict the cumulative pregnancy rate after multiple oocyte pickup cycles. Experiments on 11,190 patients showed that first clustering the patients into different groups and then building a support vector machine model for each group can achieve the best overall performance. Our model could be a quick and economic approach for reliably estimating the cumulative pregnancy rate for a patient, given only her basic medical examination information, well before starting the actual IVF procedure. The predictions can help the patient make optimal decisions on whether to use her own oocyte or donor oocyte, how many oocyte pickup cycles she may need, whether to use embryo frozen, etc. They will also reduce the patient's cost and time to pregnancy, and improve her quality of life.

cumulative pregnancy rate, prediction, pregnancy rate, (10 more...)

1911.03839

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > China > Hubei Province > Wuhan (0.05)
North America > United States > Rhode Island (0.04)
(4 more...)

Genre:

Research Report > New Finding (0.94)
Research Report > Experimental Study (0.69)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Obstetrics/Gynecology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.86)