AITopics

doi: 10.1145/3534678.3539198

2207.0992

Country:

North America > United States > District of Columbia > Washington (0.05)
Asia > China (0.05)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine (0.49)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.46)

arXiv.org Artificial IntelligenceOct-18-2023

Effective and Efficient Federated Tree Learning on Hybrid Data

Li, Qinbin, Xie, Chulin, Xu, Xiaojun, Liu, Xiaoyuan, Zhang, Ce, Li, Bo, He, Bingsheng, Song, Dawn

Federated learning has emerged as a promising distributed learning paradigm that facilitates collaborative learning among multiple parties without transferring raw data. However, most existing federated learning studies focus on either horizontal or vertical data settings, where the data of different parties are assumed to be from the same feature or sample space. In practice, a common scenario is the hybrid data setting, where data from different parties may differ both in the features and samples. To address this, we propose HybridTree, a novel federated learning approach that enables federated tree learning on hybrid data. We observe the existence of consistent split rules in trees. With the help of these split rules, we theoretically show that the knowledge of parties can be incorporated into the lower layers of a tree. Based on our theoretical analysis, we propose a layer-level solution that does not need frequent communication traffic to train a tree. Our experiments demonstrate that HybridTree can achieve comparable accuracy to the centralized setting with low computational and communication overhead. HybridTree can achieve up to 8 times speedup compared with the other baselines.

efficient federated tree learning, hybrid data

2310.11865

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.60)

Rosaler, Joshua, Desai, Dhruv, Sarmah, Bhaskarjit, Vamvourellis, Dimitrios, Onay, Deran, Mehta, Dhagash, Pasquali, Stefano

Towards Enhanced Local Explainability of Random Forests: a Proximity-Based Approach

arXiv.org Machine LearningOct-18-2023

We initiate a novel approach to explain the out of sample performance of random forest (RF) models by exploiting the fact that any RF can be formulated as an adaptive weighted K nearest-neighbors model. Specifically, we use the proximity between points in the feature space learned by the RF to re-write random forest predictions exactly as a weighted average of the target labels of training data points. This linearity facilitates a local notion of explainability of RF predictions that generates attributions for any model prediction across observations in the training set, and thereby complements established methods like SHAP, which instead generates attributions for a model prediction across dimensions of the feature space. We demonstrate this approach in the context of a bond pricing model trained on US corporate bond trades, and compare our approach to various existing approaches to model explainability.

artificial intelligence, machine learning, prediction, (15 more...)

2310.12428

Country:

North America > United States > New York > New York County > New York City (0.06)
Asia > Middle East > Jordan (0.04)
Asia > India (0.04)

Genre: Research Report (0.70)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.55)

Bénard, Clément, Näf, Jeffrey, Josse, Julie

MMD-based Variable Importance for Distributional Random Forest

arXiv.org Machine LearningOct-18-2023

Distributional Random Forest (DRF) is a flexible forest-based method to estimate the full conditional distribution of a multivariate output of interest given input variables. In this article, we introduce a variable importance algorithm for DRFs, based on the well-established drop and relearn principle and MMD distance. While traditional importance measures only detect variables with an influence on the output mean, our algorithm detects variables impacting the output distribution more generally. We show that the introduced importance measure is consistent, exhibits high empirical performance on both real and simulated data, and outperforms competitors. In particular, our algorithm is highly efficient to select variables through recursive feature elimination, and can therefore provide small sets of variables to build accurate estimates of conditional output distributions.

artificial intelligence, importance measure, machine learning, (17 more...)

2310.12115

Country:

Europe > France > Occitanie > Hérault > Montpellier (0.04)
Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.63)

Patelli, Luca, Cameletti, Michela, Golini, Natalia, Ignaccolo, Rosaria

A path in regression Random Forest looking for spatial dependence: a taxonomy and a systematic review

arXiv.org Machine LearningOct-17-2023

Random Forest (RF) is a well-known data-driven algorithm applied in several fields thanks to its flexibility in modeling the relationship between the response variable and the predictors, also in case of strong non-linearities. In environmental applications, it often occurs that the phenomenon of interest may present spatial and/or temporal dependence that is not taken explicitly into account by RF in its standard version. In this work, we propose a taxonomy to classify strategies according to when (Pre-, In- and/or Post-processing) they try to include the spatial information into regression RF. Moreover, we provide a systematic review and classify the most recent strategies adopted to "adjust" regression RF to spatially dependent data, based on the criteria provided by the Preferred Reporting Items for Systematic reviews and Meta-Analysis (PRISMA). The latter consists of a reproducible methodology for collecting and processing existing literature on a specified topic from different sources. PRISMA starts with a query and ends with a set of scientific documents to review: we performed an online query on the 25$^{th}$ October 2022 and, in the end, 32 documents were considered for review. The employed methodological strategies and the application fields considered in the 32 scientific documents are described and discussed. This work falls inside the Agriculture Impact On Italian Air (AgrImOnIA) project.

artificial intelligence, machine learning, prediction, (19 more...)

2303.04693

Country:

North America > United States (0.14)
South America (0.14)
Europe > Italy > Piedmont > Turin Province > Turin (0.14)

Genre: Research Report (1.00)

Industry:

Food & Agriculture > Agriculture (1.00)
Energy > Oil & Gas > Upstream (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.77)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Arenas, Marcelo, Barcelo, Pablo, Bustamente, Diego, Caraball, Jose, Subercaseaux, Bernardo

A Symbolic Language for Interpreting Decision Trees

arXiv.org Artificial IntelligenceOct-17-2023

The recent development of formal explainable AI has disputed the folklore claim that "decision trees are readily interpretable models", showing different interpretability queries that are computationally hard on decision trees, as well as proposing different methods to deal with them in practice. Nonetheless, no single explainability query or score works as a "silver bullet" that is appropriate for every context and end-user. This naturally suggests the possibility of "interpretability languages" in which a wide variety of queries can be expressed, giving control to the end-user to tailor queries to their particular needs. In this context, our work presents ExplainDT, a symbolic language for interpreting decision trees. ExplainDT is rooted in a carefully constructed fragment of first-ordered logic that we call StratiFOILed. StratiFOILed balances expressiveness and complexity of evaluation, allowing for the computation of many post-hoc explanations--both local (e.g., abductive and contrastive explanations) and global ones (e.g., feature relevancy)--while remaining in the Boolean Hierarchy over NP. Furthermore, StratiFOILed queries can be written as a Boolean combination of NP-problems, thus allowing us to evaluate them in practice with a constant number of calls to a SAT solver. On the theoretical side, our main contribution is an in-depth analysis of the expressiveness and complexity of StratiFOILed, while on the practical side, we provide an optimized implementation for encoding StratiFOILed queries as propositional formulas, together with an experimental study on its efficiency.

interpreting decision tree, symbolic language

2310.11636

Genre: Research Report (0.89)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Jafarigol, Elaheh, Keely, William, Hartog, Tess, Welborn, Tom, Hekmatpour, Peyman, Trafalis, Theodore B.

Religious Affiliation in the Twenty-First Century: A Machine Learning Perspective on the World Value Survey

arXiv.org Artificial IntelligenceOct-16-2023

This paper is a quantitative analysis of the data collected globally by the World Value Survey. The data is used to study the trajectories of change in individuals' religious beliefs, values, and behaviors in societies. Utilizing random forest, we aim to identify the key factors of religiosity and classify respondents of the survey as religious and non religious using country level data. We use resampling techniques to balance the data and improve imbalanced learning performance metrics. The results of the variable importance analysis suggest that Age and Income are the most important variables in the majority of countries. The results are discussed with fundamental sociological theories regarding religion and human behavior. This study is an application of machine learning in identifying the underlying patterns in the data of 30 countries participating in the World Value Survey. The results from variable importance analysis and classification of imbalanced data provide valuable insights beneficial to theoreticians and researchers of social sciences.

artificial intelligence, decision tree learning, machine learning, (19 more...)

doi: 10.1007/s12115-023-00887-0

2310.10874

Country:

Asia > Middle East > Iran (0.05)
Asia > Middle East > Yemen (0.05)
Europe > Netherlands (0.05)
(35 more...)

Genre:

Questionnaire & Opinion Survey (1.00)
Research Report > New Finding (0.94)
Research Report > Experimental Study (0.93)

Industry: Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Hong, Ye, Stüdeli, Emanuel, Raubal, Martin

Evaluating geospatial context information for travel mode detection

arXiv.org Artificial IntelligenceOct-16-2023

Detecting travel modes from global navigation satellite system (GNSS) trajectories is essential for understanding individual travel behavior and a prerequisite for achieving sustainable transport systems. While studies have acknowledged the benefits of incorporating geospatial context information into travel mode detection models, few have summarized context modeling approaches and analyzed the significance of these context features, hindering the development of an efficient model. Here, we identify context representations from related work and propose an analytical pipeline to assess the contribution of geospatial context information for travel mode detection based on a random forest model and the SHapley Additive exPlanation (SHAP) method. Through experiments on a large-scale GNSS tracking dataset, we report that features describing relationships with infrastructure networks, such as the distance to the railway or road network, significantly contribute to the model's prediction. Moreover, features related to the geospatial point entities help identify public transport travel, but most land-use and land-cover features barely contribute to the task. We finally reveal that geospatial contexts have distinct contributions in identifying different travel modes, providing insights into selecting appropriate context information and modeling approaches. The results from this study enhance our understanding of the relationship between movement and geospatial context and guide the implementation of effective and efficient transport mode detection models.

information, mode detection, travel mode, (15 more...)

doi: 10.1016/j.jtrangeo.2023.103736

2305.19428

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)
Europe > Germany (0.04)
(4 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Rail (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(4 more...)

arXiv.org Machine LearningOct-16-2023

Data-driven multinomial random forest: A new random forest variant with strong consistency

Chen, JunHao

In this paper, we modify the proof methods of some previously weakly consistent variants of random forests into strongly consistent proof methods, and improve the data utilization of these variants in order to obtain better theoretical properties and experimental performance. In addition, we propose a data-driven multinomial random forest (DMRF), which has the same complexity with BreimanRF (proposed by Breiman) while satisfying strong consistency with probability 1. It has better performance in classification and regression problems than previous RF variants that only satisfy weak consistency, and in most cases even surpasses BreimanRF in classification tasks. To the best of our knowledge, DMRF is currently a low-complexity and high-performing variation of random forests that achieves strong consistency with probability 1.

artificial intelligence, consistency, machine learning, (19 more...)

2211.15154

Country:

Asia > China > Beijing > Beijing (0.04)
North America > United States > Nevada > Clark County > Las Vegas (0.04)
Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
(2 more...)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Arram, Anas, Ayob, Masri, Albadr, Musatafa Abbas Abbood, Sulaiman, Alaa, Albashish, Dheeb

Credit card score prediction using machine learning models: A new dataset

arXiv.org Artificial IntelligenceOct-15-2023

The use of credit cards has recently increased, creating an essential need for credit card assessment methods to minimize potential risks. This study investigates the utilization of machine learning (ML) models for credit card default prediction system. The main goal here is to investigate the best-performing ML model for new proposed credit card scoring dataset. This new dataset includes credit card transaction histories and customer profiles, is proposed and tested using a variety of machine learning algorithms, including logistic regression, decision trees, random forests, multi-layer perceptron (MLP) neural network, XGBoost, and LightGBM. To prepare the data for machine learning models, we perform data pre-processing, feature extraction, feature selection, and data balancing techniques. Experimental results demonstrate that MLP outperforms logistic regression, decision trees, random forests, LightGBM, and XGBoost in terms of predictive performance in true positive rate, achieving an impressive area under the curve (AUC) of 86.7% and an accuracy rate of 91.6%, with a recall rate exceeding 80%. These results indicate the superiority of MLP in predicting the default customers and assessing the potential risks. Furthermore, they help banks and other financial institutions in predicting loan defaults at an earlier stage.

dataset, prediction, regression, (16 more...)

2310.02956

Country:

Asia > Malaysia (0.05)
Asia > Middle East > Iraq > Basra Governorate > Basrah (0.04)
North America > United States (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry: Banking & Finance > Credit (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)