AITopics

Genre:

Research Report > New Finding (0.40)
Research Report > Experimental Study (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.40)

arXiv.org Machine LearningMay-25-2019

Asymptotic Distributions and Rates of Convergence for Random Forests and other Resampled Ensemble Learners

Peng, Wei, Coleman, Tim, Mentch, Lucas

Random forests remain among the most popular off-the-shelf supervised learning algorithms. Despite their well-documented empirical success, however, until recently, few theoretical results were available to describe their performance and behavior. In this work we push beyond recent work on consistency and asymptotic normality by establishing rates of convergence for random forests and other supervised learning ensembles. We develop the notion of generalized U-statistics and show that within this framework, random forest predictions remain asymptotically normal for larger subsample sizes than previously established. We also provide Berry-Esseen bounds in order to quantify the rate at which this convergence occurs, making explicit the roles of the subsample size and the number of trees in determining the distribution of random forest predictions.

artificial intelligence, machine learning, u-statistics, (15 more...)

1905.10651

Country: North America > United States (0.46)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

arXiv.org Machine LearningMay-24-2019

HDI-Forest: Highest Density Interval Regression Forest

Zhu, Lin, Lu, Jiaxin, Chen, Yihong

By seeking the narrowest prediction intervals (PIs) that satisfy the specified coverage probability requirements, the recently proposed quality-based PI learning principle can extract high-quality PIs that better summarize the predictive certainty in regression tasks, and has been widely applied to solve many practical problems. Currently, the state-of-the-art quality-based PI estimation methods are based on deep neural networks or linear models. In this paper, we propose Highest Density Interval Regression Forest (HDI-Forest), a novel quality-based PI estimation method that is instead based on Random Forest. HDI-Forest does not require additional model training, and directly reuses the trees learned in a standard Random Forest model. By utilizing the special properties of Random Forest, HDI-Forest could efficiently and more directly optimize the PI quality metrics. Extensive experiments on benchmark datasets show that HDI-Forest significantly outperforms previous approaches, reducing the average PI width by over 30\% while achieving the same or better coverage probability.

artificial intelligence, hdi-forest, machine learning, (17 more...)

1905.10101

Genre: Research Report (0.64)

Industry: Energy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.77)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

arXiv.org Machine LearningMay-24-2019

Federated Forest

Liu, Yang, Liu, Yingting, Liu, Zhijie, Zhang, Junbo, Meng, Chuishi, Zheng, Yu

Most real-world data are scattered across different companies or government organizations, and cannot be easily integrated under data privacy and related regulations such as the European Union's General Data Protection Regulation (GDPR) and China' Cyber Security Law. Such data islands situation and data privacy & security are two major challenges for applications of artificial intelligence. In this paper, we tackle these challenges and propose a privacy-preserving machine learning model, called Federated Forest, which is a lossless learning model of the traditional random forest method, i.e., achieving the same level of accuracy as the non-privacy-preserving approach. Based on it, we developed a secure cross-regional machine learning system that allows a learning process to be jointly trained over different regions' clients with the same user samples but different attribute sets, processing the data stored in each of them without exchanging their raw data. A novel prediction algorithm was also proposed which could largely reduce the communication overhead. Experiments on both real-world and UCI data sets demonstrate the performance of the Federated Forest is as accurate as the non-federated version. The efficiency and robustness of our proposed system had been verified. Overall, our model is practical, scalable and extensible for real-life tasks.

data mining, machine learning, node, (19 more...)

1905.10053

Country: Asia > China (0.49)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.69)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.67)
Information Technology > Data Science > Data Mining > Big Data (0.54)

#artificialintelligenceMay-21-2019, 17:43:38 GMT

sabiha90/Random-Forest-Explainability-Pipeline

This toolkit serves to execute RFEX 2.0 "pipeline" e.g. a set of steps to produce information which comprises RFEX 2.0 summary namely information to enhance explainability of Random Forest classifier. It comes with the synthetically generated test database which helps to demonstrate how RFEX 2.0 works. Wth this toolkit users can also use their own data to generate RFEX 2.0 summary. Background of the RFEX 2.0 method, as well as the description and access to the synthetic test database convenient to test and demonstrate can be found in TR 18.01 at cs.sfsu.edu Users are strongly advised to read the above report before using this toolkit.

artificial intelligence, machine learning, rfex 2, (8 more...)

Country:

North America > United States > Hawaii (0.06)
North America > United States > California > San Francisco County > San Francisco (0.06)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.69)

#artificialintelligenceMay-21-2019, 17:43:28 GMT

Random Forest Explainability toolkit (RFEX) using R and Python in Jupyter Notebook

RFEX creates a simple to understand summary of how trained RF makes its classification.

artificial intelligence, machine learning, social media, (7 more...)

Country:

North America > United States > Hawaii (0.11)
North America > United States > California > Santa Clara County > Palo Alto (0.11)

Technology:

Information Technology > Communications > Social Media (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.53)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.53)

#artificialintelligenceMay-18-2019, 17:05:59 GMT

Enterprise AI: Diving into Machine Learning

Data in the real world, of course, isn't as simple as it is in the previous example. There are always complexities and nuances to data. To stick with our housing market example, the value of houses might also be influenced by dwelling type, lot size, recent upgrades, proximity to a neighborhood park and intangible variables like curbside appeal. And, in the real world, houses wouldn't all be in the same neighborhood, so your machine learning model must also consider the ZIP code for the property. To consider this wider range of variables, we need to dig deeper into the data scientist's toolbox and pull out some more sophisticated machine learning methods, including random forests and gradient boosting.

artificial intelligence, decision tree learning, machine learning, (15 more...)

Industry: Banking & Finance > Real Estate (0.55)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.58)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.42)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.33)

arXiv.org Machine LearningMay-18-2019

Disentangled Attribution Curves for Interpreting Random Forests and Boosted Trees

Devlin, Summer, Singh, Chandan, Murdoch, W. James, Yu, Bin

Tree ensembles, such as random forests and AdaBoost, are ubiquitous machine learning models known for achieving strong predictive performance across a wide variety of domains. However, this strong performance comes at the cost of interpretability (i.e. users are unable to understand the relationships a trained random forest has learned and why it is making its predictions). In particular, it is challenging to understand how the contribution of a particular feature, or group of features, varies as their value changes. To address this, we introduce Disentangled Attribution Curves (DAC), a method to provide interpretations of tree ensemble methods in the form of (multivariate) feature importance curves. For a given variable, or group of variables, DAC plots the importance of a variable(s) as their value changes. We validate DAC on real data by showing that the curves can be used to increase the accuracy of logistic regression while maintaining interpretability, by including DAC as an additional feature. In simulation studies, DAC is shown to out-perform competing methods in the recovery of conditional expectations. Finally, through a case-study on the bike-sharing dataset, we demonstrate the use of DAC to uncover novel insights into a dataset.

artificial intelligence, decision tree learning, machine learning, (13 more...)

1905.07631

Country:

Europe (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre: Research Report (1.00)

Industry: Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.37)

Joly, Arnaud, Wehenkel, Louis, Geurts, Pierre

Gradient tree boosting with random output projections for multi-label classification and multi-output regression

arXiv.org Machine LearningMay-18-2019

Multi-output supervised learning aims to model input-output relationships from observations of inputoutput pairs whenever the output space is a vector of random variables. Multi-output classification and regression tasks have numerous applications in domains ranging from biology to multimedia, and recent applications in this area correspond to very high dimensional output spaces (Agrawal et al, 2013; Dekel and Shamir, 2010). Classification and regression trees (Breiman et al, 1984) are popular supervised learning methods that provide state-of-the-art performance when exploited in the context of ensemble methods, namely Random forests (Breiman, 2001; Geurts et al, 2006) and Boosting (Freund and Schapire, 1997; Friedman, 2001). Classification and regression trees can obviously be exploited to handle multi-output problems. The most straightforward way to address multi-output tasks is to apply standard single output methods separately and independently on each output. Although simple, this method, called binary relevance (Tsoumakas et al, 2009) in multi-label classification or single target (Spyromitros-Xioufis et al, 2012) in multi-output regression is often suboptimal as it does not exploit potential correlations that might exist between the outputs. Tree ensemble methods have however been explicitely extended by several authors to the joint prediction of multiple outputs (e.g., Segal, 1992; Blockeel et al, 2000). These extensions build a single tree to predict all outputs at once. They adapt the score measure used to assess splits during the tree growth to take into account all outputs and label each tree leaf with a vector of values, one for each output.

artificial intelligence, bayesian inference, machine learning, (18 more...)

1905.07558

Country: Europe (0.28)

Genre: Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)
(2 more...)

AAAI ConferencesMay-15-2019

Using EEG Features and Machine Learning to Predict Gifted Children

Ghali, Ramla (Université de Montréal) | Tato, Ange (Université de Montréal) | Nkambou, Roger (Université de Montréal)

Gifted students have a higher capabilities of understanding and learning. They are characterized by a high level of attention and a high performance in the classroom. Gifted children are defined in this paper as children who have a performance higher than the average group (59.64%). In order to predict gifted students from normal students, we conducted an experiment where 17 pupils have voluntarily participated in this study. We collected different types of data (gender, age, performance, initial average in math and EEG mental states) in a web platform to learn mathematics called NetMath. Participants were invited to respond to top-level exercises on the four basic operations in decimals. We trained different machine learning algorithms to predict gifted students. Our first results show that the decision tree could predict gifted students with an accuracy of 76.88%. Using J48 trees, we noticed also that two relevant features could determine gifted children: the relaxation extracted from EEG headset and the characteristic of strong student. A strong student is defined as a student who obtained a mean higher than the group’s mean in the first step evaluation in class.

gifted children, gifted student, student, (12 more...)

AAAI Conferences

The Thirty-Second International Flairs Conference

Country:

North America > United States > Texas > Bexar County > San Antonio (0.05)
North America > United States > Illinois > Cook County > Chicago (0.05)
North America > Canada > Quebec > Montreal (0.05)
(2 more...)

Genre: Research Report > New Finding (0.68)

Industry: Education > Focused Education > Gifted Children (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.36)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.36)