Goto

Collaborating Authors

 Decision Tree Learning


Energy Expenditure Estimation Through Daily Activity Recognition Using a Smart-phone

arXiv.org Artificial Intelligence

This paper presents a 3-step system that estimates the real-time energy expenditure of an individual in a non-intrusive way. First, using the user's smart-phone's sensors, we build a Decision Tree model to recognize his physical activity (\textit{running}, \textit{standing}, ...). Then, we use the detected physical activity, the time and the user's speed to infer his daily activity (\textit{watching TV}, \textit{going to the bathroom}, ...) through the use of a reinforcement learning environment, the Partially Observable Markov Decision Process framework. Once the daily activities are recognized, we translate this information into energy expenditure using the compendium of physical activities. By successfully detecting 8 physical activities at 90\%, we reached an overall accuracy of 80\% in recognizing 17 different daily activities. This result leads us to estimate the energy expenditure of the user with a mean error of 26\% of the expected estimation.


13 Algorithms and 4 Learning Methods of Machine Learning

#artificialintelligence

According to the similarity of the function and form of the algorithm, we can classify the algorithm, such as tree-based algorithm, neural network-based algorithm, and so on. Of course, the scope of machine learning is very large, and it is difficult for some algorithms to be clearly classified into a certain category. Regression algorithm is a type of algorithm that tries to explore the relationship between variables by using a measure of error. Regression algorithm is a powerful tool for statistical machine learning. In the field of machine learning, when people talk about regression, sometimes they refer to a type of problem and sometimes a type of algorithm.


Improved Weighted Random Forest for Classification Problems

arXiv.org Machine Learning

Several studies have shown that combining machine learning models in an appropriate way will introduce improvements in the individual predictions made by the base models. The key to make well-performing ensemble model is in the diversity of the base models. Of the most common solutions for introducing diversity into the decision trees are bagging and random forest. Bagging enhances the diversity by sampling with replacement and generating many training data sets, while random forest adds selecting a random number of features as well. This has made the random forest a winning candidate for many machine learning applications. However, assuming equal weights for all base decision trees does not seem reasonable as the randomization of sampling and input feature selection may lead to different levels of decision-making abilities across base decision trees. Therefore, we propose several algorithms that intend to modify the weighting strategy of regular random forest and consequently make better predictions. The designed weighting frameworks include optimal weighted random forest based on ac-curacy, optimal weighted random forest based on the area under the curve (AUC), performance-based weighted random forest, and several stacking-based weighted random forest models. The numerical results show that the proposed models are able to introduce significant improvements compared to regular random forest.


Random Forest (RF) Kernel for Regression, Classification and Survival

arXiv.org Machine Learning

Breiman's random forest (RF) can be interpreted as an implicit kernel generator,where the ensuing proximity matrix represents the data-driven RF kernel. Kernel perspective on the RF has been used to develop a principled framework for theoretical investigation of its statistical properties. However, practical utility of the links between kernels and the RF has not been widely explored and systematically evaluated.Focus of our work is investigation of the interplay between kernel methods and the RF. We elucidate the performance and properties of the data driven RF kernels used by regularized linear models in a comprehensive simulation study comprising of continuous, binary and survival targets. We show that for continuous and survival targets, the RF kernels are competitive to RF in higher dimensional scenarios with larger number of noisy features. For the binary target, the RF kernel and RF exhibit comparable performance. As the RF kernel asymptotically converges to the Laplace kernel, we included it in our evaluation. For most simulation setups, the RF and RFkernel outperformed the Laplace kernel. Nevertheless, in some cases the Laplace kernel was competitive, showing its potential value for applications. We also provide the results from real life data sets for the regression, classification and survival to illustrate how these insights may be leveraged in practice.Finally, we discuss further extensions of the RF kernels in the context of interpretable prototype and landmarking classification, regression and survival. We outline future line of research for kernels furnished by Bayesian counterparts of the RF.


Random Forest Vs XGBoost Tree Based Algorithms

#artificialintelligence

In machine learning, we mainly deal with two kinds of problems that are classification and regression. There are several different types of algorithms for both tasks. But we need to pick that algorithm whose performance is good on the respective data. Ensemble methods like Random Forest, Decision Tree, XGboost algorithms have shown very good results when we talk about classification. These algorithms give high accuracy at fast speed.


Build a decision tree in SAS

#artificialintelligence

Decision trees are a fundamental machine learning technique that every data scientist should know. Luckily, the construction and implementation of decision trees in SAS is straightforward and easy to produce. The data that we will use for this example is found in the fantastic UCI Machine Learning Repository. The data set is titled "Bank Marketing Dataset," and it can be found at: http://archive.ics.uci.edu/ml/datasets/Bank This data set represents a direct marketing campaign (phone calls) conducted by a Portuguese banking institution.


Learning Attribute-Based and Relationship-Based Access Control Policies with Unknown Values

arXiv.org Artificial Intelligence

Attribute-Based Access Control (ABAC) and Relationship-based access control (ReBAC) provide a high level of expressiveness and flexibility that promote security and information sharing, by allowing policies to be expressed in terms of attributes of and chains of relationships between entities. Algorithms for learning ABAC and ReBAC policies from legacy access control information have the potential to significantly reduce the cost of migration to ABAC or ReBAC. This paper presents the first algorithms for mining ABAC and ReBAC policies from access control lists (ACLs) and incomplete information about entities, where the values of some attributes of some entities are unknown. We show that the core of this problem can be viewed as learning a concise three-valued logic formula from a set of labeled feature vectors containing unknowns, and we give the first algorithm (to the best of our knowledge) for that problem.


A Performance-Explainability Framework to Benchmark Machine Learning Methods: Application to Multivariate Time Series Classifiers

arXiv.org Artificial Intelligence

In order to match these requirements and conduct experiments to validate the usefulness of the explanations Our research aims to propose a new performanceexplainability by the end-users, there is a need to have a comprehensive analytical framework to assess and assessment of the explainability of the existing methods.


Rectified Decision Trees: Exploring the Landscape of Interpretable and Effective Machine Learning

arXiv.org Machine Learning

Interpretability and effectiveness are two essential and indispensable requirements for adopting machine learning methods in reality. In this paper, we propose a knowledge distillation based decision trees extension, dubbed rectified decision trees (ReDT), to explore the possibility of fulfilling those requirements simultaneously. Specifically, we extend the splitting criteria and the ending condition of the standard decision trees, which allows training with soft labels while preserving the deterministic splitting paths. We then train the ReDT based on the soft label distilled from a well-trained teacher model through a novel jackknife-based method. Accordingly, ReDT preserves the excellent interpretable nature of the decision trees while having a relatively good performance. The effectiveness of adopting soft labels instead of hard ones is also analyzed empirically and theoretically. Surprisingly, experiments indicate that the introduction of soft labels also reduces the model size compared with the standard decision trees from the aspect of the total nodes and rules, which is an unexpected gift from the `dark knowledge' distilled from the teacher model.


Robust Similarity and Distance Learning via Decision Forests

arXiv.org Machine Learning

Canonical distances such as Euclidean distance often fail to capture the appropriate relationships between items, subsequently leading to subpar inference and prediction. Many algorithms have been proposed for automated learning of suitable distances, most of which employ linear methods to learn a global metric over the feature space. While such methods offer nice theoretical properties, interpretability, and computationally efficient means for implementing them, they are limited in expressive capacity. Methods which have been designed to improve expressiveness sacrifice one or more of the nice properties of the linear methods. To bridge this gap, we propose a highly expressive novel decision forest algorithm for the task of distance learning, which we call Similarity and Metric Random Forests (SMERF). We show that the tree construction procedure in SMERF is a proper generalization of standard classification and regression trees. Thus, the mathematical driving forces of SMERF are examined via its direct connection to regression forests, for which theory has been developed. Its ability to approximate arbitrary distances and identify important features is empirically demonstrated on simulated data sets. Last, we demonstrate that it accurately predicts links in networks.