Decision Tree Learning
Optimizing Interpretable Decision Tree Policies for Reinforcement Learning
Reinforcement learning techniques leveraging deep learning have made tremendous progress in recent years. However, the complexity of neural networks prevents practitioners from understanding their behavior. Decision trees have gained increased attention in supervised learning for their inherent interpretability, enabling modelers to understand the exact prediction process after learning. This paper considers the problem of optimizing interpretable decision tree policies to replace neural networks in reinforcement learning settings. Previous works have relaxed the tree structure, restricted to optimizing only tree leaves, or applied imitation learning techniques to approximately copy the behavior of a neural network policy with a decision tree. We propose the Decision Tree Policy Optimization (DTPO) algorithm that directly optimizes the complete decision tree using policy gradients. Our technique uses established decision tree heuristics for regression to perform policy optimization. We empirically show that DTPO is a competitive algorithm compared to imitation learning algorithms for optimizing decision tree policies in reinforcement learning.
Vanilla Gradient Descent for Oblique Decision Trees
Panda, Subrat Prasad, Genest, Blaise, Easwaran, Arvind, Suganthan, Ponnuthurai Nagaratnam
Decision Trees (DTs) constitute one of the major highly non-linear AI models, valued, e.g., for their efficiency on tabular data. Learning accurate DTs is, however, complicated, especially for oblique DTs, and does take a significant training time. Further, DTs suffer from overfitting, e.g., they proverbially "do not generalize" in regression tasks. Recently, some works proposed ways to make (oblique) DTs differentiable. This enables highly efficient gradient-descent algorithms to be used to learn DTs. It also enables generalizing capabilities by learning regressors at the leaves simultaneously with the decisions in the tree. Prior approaches to making DTs differentiable rely either on probabilistic approximations at the tree's internal nodes (soft DTs) or on approximations in gradient computation at the internal node (quantized gradient descent). In this work, we propose DTSemNet, a novel semantically equivalent and invertible encoding for (hard, oblique) DTs as Neural Networks (NNs), that uses standard vanilla gradient descent. Experiments across various classification and regression benchmarks show that oblique DTs learned using DTSemNet are more accurate than oblique DTs of similar size learned using state-of-the-art techniques. Further, DT training time is significantly reduced. We also experimentally demonstrate that DTSemNet can learn DT policies as efficiently as NN policies in the Reinforcement Learning (RL) setup with physical inputs (dimensions $\leq32$). The code is available at {\color{blue}\textit{\url{https://github.com/CPS-research-group/dtsemnet}}}.
Evaluation Framework for AI-driven Molecular Design of Multi-target Drugs: Brain Diseases as a Case Study
Cerveira, Arthur, Kremer, Frederico, Lourenço, Darling de Andrade, Corrêa, Ulisses B
The widespread application of Artificial Intelligence (AI) techniques has significantly influenced the development of new therapeutic agents. These computational methods can be used to design and predict the properties of generated molecules. Multi-target Drug Discovery (MTDD) is an emerging paradigm for discovering drugs against complex disorders that do not respond well to more traditional target-specific treatments, such as central nervous system, immune system, and cardiovascular diseases. Still, there is yet to be an established benchmark suite for assessing the effectiveness of AI tools for designing multi-target compounds. Standardized benchmarks allow for comparing existing techniques and promote rapid research progress. Hence, this work proposes an evaluation framework for molecule generation techniques in MTDD scenarios, considering brain diseases as a case study. Our methodology involves using large language models to select the appropriate molecular targets, gathering and preprocessing the bioassay datasets, training quantitative structure-activity relationship models to predict target modulation, and assessing other essential drug-likeness properties for implementing the benchmarks. Additionally, this work will assess the performance of four deep generative models and evolutionary algorithms over our benchmark suite. In our findings, both evolutionary algorithms and generative models can achieve competitive results across the proposed benchmarks.
Shapley Marginal Surplus for Strong Models
de Marchi, Daniel, Kosorok, Michael, de Marchi, Scott
Shapley values have seen widespread use in machine learning as a way to explain model predictions and estimate the importance of covariates. Accurately explaining models is critical in real-world models to both aid in decision making and to infer the properties of the true data-generating process (DGP). In this paper, we demonstrate that while model-based Shapley values might be accurate explainers of model predictions, machine learning models themselves are often poor explainers of the DGP even if the model is highly accurate. Particularly in the presence of interrelated or noisy variables, the output of a highly predictive model may fail to account for these relationships. This implies explanations of a trained model's behavior may fail to provide meaningful insight into the DGP. In this paper we introduce a novel variable importance algorithm, Shapley Marginal Surplus for Strong Models, that samples the space of possible models to come up with an inferential measure of feature importance. We compare this method to other popular feature importance methods, both Shapley-based and non-Shapley based, and demonstrate significant outperformance in inferential capabilities relative to other methods.
Optimizing Emotion Recognition with Wearable Sensor Data: Unveiling Patterns in Body Movements and Heart Rate through Random Forest Hyperparameter Tuning
Nur, Zikri Kholifah, Wijaya, Rifki, Wulandari, Gia Septiana
This research delves into the utilization of smartwatch sensor data and heart rate monitoring to discern individual emotions based on body movement and heart rate. Emotions play a pivotal role in human life, influencing mental well-being, quality of life, and even physical and physiological responses. The data were sourced from prior research by Juan C. Quiroz, PhD. The study enlisted 50 participants who donned smartwatches and heart rate monitors while completing a 250-meter walk. Emotions were induced through both audio-visual and audio stimuli, with participants' emotional states evaluated using the PANAS questionnaire. The study scrutinized three scenarios: viewing a movie before walking, listening to music before walking, and listening to music while walking. Personal baselines were established using DummyClassifier with the 'most_frequent' strategy from the sklearn library, and various models, including Logistic Regression and Random Forest, were employed to gauge the impacts of these activities. Notably, a novel approach was undertaken by incorporating hyperparameter tuning to the Random Forest model using RandomizedSearchCV. The outcomes showcased substantial enhancements with hyperparameter tuning in the Random Forest model, yielding mean accuracies of 86.63% for happy vs. sad and 76.33% for happy vs. neutral vs. sad.
Alpha-Trimming: Locally Adaptive Tree Pruning for Random Forests
Surjanovic, Nikola, Henrey, Andrew, Loughin, Thomas M.
We demonstrate that adaptively controlling the size of individual regression trees in a random forest can improve predictive performance, contrary to the conventional wisdom that trees should be fully grown. A fast pruning algorithm, alpha-trimming, is proposed as an effective approach to pruning trees within a random forest, where more aggressive pruning is performed in regions with a low signal-to-noise ratio. The amount of overall pruning is controlled by adjusting the weight on an information criterion penalty as a tuning parameter, with the standard random forest being a special case of our alpha-trimmed random forest. A remarkable feature of alpha-trimming is that its tuning parameter can be adjusted without refitting the trees in the random forest once the trees have been fully grown once. In a benchmark suite of 46 example data sets, mean squared prediction error is often substantially lowered by using our pruning algorithm and is never substantially increased compared to a random forest with fully-grown trees at default parameter settings.
Axiomatic Characterisations of Sample-based Explainers
Amgoud, Leila, Cooper, Martin C., Debbaoui, Salim
Explaining decisions of black-box classifiers is both important and computationally challenging. In this paper, we scrutinize explainers that generate feature-based explanations from samples or datasets. We start by presenting a set of desirable properties that explainers would ideally satisfy, delve into their relationships, and highlight incompatibilities of some of them. We identify the entire family of explainers that satisfy two key properties which are compatible with all the others. Its instances provide sufficient reasons, called weak abductive explanations.We then unravel its various subfamilies that satisfy subsets of compatible properties. Indeed, we fully characterize all the explainers that satisfy any subset of compatible properties. In particular, we introduce the first (broad family of) explainers that guarantee the existence of explanations and their global consistency.We discuss some of its instances including the irrefutable explainer and the surrogate explainer whose explanations can be found in polynomial time.
S-SIRUS: an explainability algorithm for spatial regression Random Forest
Patelli, Luca, Golini, Natalia, Ignaccolo, Rosaria, Cameletti, Michela
Random Forest (RF) is a widely used machine learning algorithm known for its flexibility, user-friendliness, and high predictive performance across various domains. However, it is non-interpretable. This can limit its usefulness in applied sciences, where understanding the relationships between predictors and response variable is crucial from a decision-making perspective. In the literature, several methods have been proposed to explain RF, but none of them addresses the challenge of explaining RF in the context of spatially dependent data. Therefore, this work aims to explain regression RF in the case of spatially dependent data by extracting a compact and simple list of rules. In this respect, we propose S-SIRUS, a spatial extension of SIRUS, the latter being a well-established regression rule algorithm able to extract a stable and short list of rules from the classical regression RF algorithm. A simulation study was conducted to evaluate the explainability capability of the proposed S-SIRUS, in comparison to SIRUS, by considering different levels of spatial dependence among the data. The results suggest that S-SIRUS exhibits a higher test predictive accuracy than SIRUS when spatial correlation is present. Moreover, for higher levels of spatial correlation, S-SIRUS produces a shorter list of rules, easing the explanation of the mechanism behind the predictions.
GLEAMS: Bridging the Gap Between Local and Global Explanations
Visani, Giorgio, Stanzione, Vincenzo, Garreau, Damien
The explainability of machine learning algorithms is crucial, and numerous methods have emerged recently. Local, post-hoc methods assign an attribution score to each feature, indicating its importance for the prediction. However, these methods require recalculating explanations for each example. On the other side, while there exist global approaches they often produce explanations that are either overly simplistic and unreliable or excessively complex. To bridge this gap, we propose GLEAMS, a novel method that partitions the input space and learns an interpretable model within each sub-region, thereby providing both faithful local and global surrogates. We demonstrate GLEAMS' effectiveness on both synthetic and real-world data, highlighting its desirable properties and human-understandable insights.
Artificial Intelligence for Public Health Surveillance in Africa: Applications and Opportunities
Tshimula, Jean Marie, Kalengayi, Mitterrand, Makenga, Dieumerci, Lilonge, Dorcas, Asumani, Marius, Madiya, Déborah, Kalonji, Élie Nkuba, Kanda, Hugues, Galekwa, René Manassé, Kumbu, Josias, Mikese, Hardy, Tshimula, Grace, Muabila, Jean Tshibangu, Mayemba, Christian N., Nkashama, D'Jeff K., Kalala, Kalonji, Ataky, Steve, Basele, Tighana Wenge, Didier, Mbuyi Mukendi, Kasereka, Selain K., Dialufuma, Maximilien V., Kumwita, Godwill Ilunga Wa, Muyuku, Lionel, Kimpesa, Jean-Paul, Muteba, Dominique, Abedi, Aaron Aruna, Ntobo, Lambert Mukendi, Bundutidi, Gloria M., Mashinda, Désiré Kulimba, Mpinga, Emmanuel Kabengele, Kasoro, Nathanaël M.
Artificial Intelligence (AI) is revolutionizing various fields, including public health surveillance. In Africa, where health systems frequently encounter challenges such as limited resources, inadequate infrastructure, failed health information systems and a shortage of skilled health professionals, AI offers a transformative opportunity. This paper investigates the applications of AI in public health surveillance across the continent, presenting successful case studies and examining the benefits, opportunities, and challenges of implementing AI technologies in African healthcare settings. Our paper highlights AI's potential to enhance disease monitoring and health outcomes, and support effective public health interventions. The findings presented in the paper demonstrate that AI can significantly improve the accuracy and timeliness of disease detection and prediction, optimize resource allocation, and facilitate targeted public health strategies. Additionally, our paper identified key barriers to the widespread adoption of AI in African public health systems and proposed actionable recommendations to overcome these challenges.