Goto

Collaborating Authors

 Decision Tree Learning


Automatic glissade determination through a mathematical model in electrooculographic records

arXiv.org Artificial Intelligence

The glissadic overshoot is characterized by an unwanted type of movement known as glissades. The glissades are a short ocular movement that describe the failure of the neural programming of saccades to move the eyes in order to reach a specific target. In this paper we develop a procedure to determine if a specific saccade have a glissade appended to the end of it. The use of the third partial sum of the Gauss series as mathematical model, a comparison between some specific parameters and the RMSE error are the steps made to reach this goal. Finally a machine learning algorithm is trained, returning expected responses of the presence or not of this kind of ocular movement.


Tree-Based Machine Learning Algorithms

#artificialintelligence

The simplest model is the Decision Tree. A combination of Decision Trees builds a Random Forest. Random Forest usually has higher accuracy than Decision Tree does. A group of Decision Trees built one after another by learning their predecessor is Adaptive Boosting and Gradient Boosting Machine. Adaptive and Gradient Boosting Machine can perform with better accuracy than Random Forest can. Extreme Gradient Boosting is created to compensate for the overfitting problem of Gradient Boosting. Thus, we can say that in general Extreme Gradient Boosting has the best accuracy amongst tree-based algorithms. Many say that Extreme Gradient Boosting wins many Machine Learning competitions. If you find this article useful, please feel free to share.


Deep imagination is a close to optimal policy for planning in large decision trees under limited resources

arXiv.org Machine Learning

Many decisions involve choosing an uncertain course of actions in deep and wide decision trees, as when we plan to visit an exotic country for vacation. In these cases, exhaustive search for the best sequence of actions is not tractable due to the large number of possibilities and limited time or computational resources available to make the decision. Therefore, planning agents need to balance breadth (exploring many actions at each level of the tree) and depth (exploring many levels in the tree) to allocate optimally their finite search capacity. We provide efficient analytical solutions and numerical analysis to the problem of allocating finite sampling capacity in one shot to large decision trees. We find that in general the optimal policy is to allocate few samples per level so that deep levels can be reached, thus favoring depth over breadth search. In contrast, in poor environments and at low capacity, it is best to broadly sample branches at the cost of not sampling deeply, although this policy is marginally better than deep allocations. Our results provide a theoretical foundation for the optimality of deep imagination for planning and show that it is a generally valid heuristic that could have evolved from the finite constraints of cognitive systems.


On the Computational Intelligibility of Boolean Classifiers

arXiv.org Artificial Intelligence

In this paper, we investigate the computational intelligibility of Boolean classifiers, characterized by their ability to answer XAI queries in polynomial time. The classifiers under consideration are decision trees, DNF formulae, decision lists, decision rules, tree ensembles, and Boolean neural nets. Using 9 XAI queries, including both explanation queries and verification queries, we show the existence of large intelligibility gap between the families of classifiers. On the one hand, all the 9 XAI queries are tractable for decision trees. On the other hand, none of them is tractable for DNF formulae, decision lists, random forests, boosted decision trees, Boolean multilayer perceptrons, and binarized neural networks.


Conclusive Local Interpretation Rules for Random Forests

arXiv.org Artificial Intelligence

In critical situations involving discrimination, gender inequality, economic damage, and even the possibility of casualties, machine learning models must be able to provide clear interpretations for their decisions. Otherwise, their obscure decision-making processes can lead to socioethical issues as they interfere with people's lives. In the aforementioned sectors, random forest algorithms strive, thus their ability to explain themselves is an obvious requirement. In this paper, we present LionForests, which relies on a preliminary work of ours. LionForests is a random forest-specific interpretation technique, which provides rules as explanations. It is applicable from binary classification tasks to multi-class classification and regression tasks, and it is supported by a stable theoretical background. Experimentation, including sensitivity analysis and comparison with state-of-the-art techniques, is also performed to demonstrate the efficacy of our contribution. Finally, we highlight a unique property of LionForests, called conclusiveness, that provides interpretation validity and distinguishes it from previous techniques.


Random Intersection Chains

arXiv.org Machine Learning

Interactions between several features sometimes play an important role in prediction tasks. But taking all the interactions into consideration will lead to an extremely heavy computational burden. For categorical features, the situation is more complicated since the input will be extremely high-dimensional and sparse if one-hot encoding is applied. Inspired by association rule mining, we propose a method that selects interactions of categorical features, called Random Intersection Chains. It uses random intersections to detect frequent patterns, then selects the most meaningful ones among them. At first a number of chains are generated, in which each node is the intersection of the previous node and a random chosen observation. The frequency of patterns in the tail nodes is estimated by maximum likelihood estimation, then the patterns with largest estimated frequency are selected. After that, their confidence is calculated by Bayes' theorem. The most confident patterns are finally returned by Random Intersection Chains. We show that if the number and length of chains are appropriately chosen, the patterns in the tail nodes are indeed the most frequent ones in the data set. We analyze the computation complexity of the proposed algorithm and prove the convergence of the estimators. The results of a series of experiments verify the efficiency and effectiveness of the algorithm.


Hollow-tree Super: a directional and scalable approach for feature importance in boosted tree models

arXiv.org Machine Learning

Current limitations in boosted tree modelling prevent the effective scaling to datasets with a large feature number, particularly when investigating the magnitude and directionality of various features on classification. We present a novel methodology, Hollow-tree Super (HOTS), to resolve and visualize feature importance in boosted tree models involving a large number of features. Further, HOTS allows for investigation of the directionality and magnitude various features have on classification. Using the Iris dataset, we first compare HOTS to Gini Importance, Partial Dependence Plots, and Permutation Importance, and demonstrate how HOTS resolves the weaknesses present in these methods. We then show how HOTS can be utilized in high dimensional neuroscientific data, by taking 60 Schizophrenic subjects and applying the method to determine which brain regions were most important for classification of schizophrenia as determined by the PANSS. HOTS effectively replicated and supported the findings of Gini importance, Partial Dependence Plots and Permutation importance within the Iris dataset. When applied to the schizophrenic brain dataset, HOTS was able to resolve the top 10 most important features for classification, as well as their directionality for classification and magnitude compared to other features. Cross-validation supported that these same 10 features were consistently used in the decision-making process across multiple trees, and these features were localised primarily to the occipital and parietal cortices, commonly disturbed brain regions in those with Schizophrenia. It is imperative that a methodology is developed that is able to handle the demands of working with large datasets that contain a large number of features. HOTS represents a unique way to investigate both the directionality and magnitude of feature importance when working at scale with boosted-tree modelling.


Random forest regressor sklearn : Step By Step Implementation

#artificialintelligence

There are various hyperparameter in RandomForestRegressor class but their default values like n_estimators 100, *, criterion'mse', max_depth None, min_samples_split 2 etc. We can choose their optimal values using some hyperparametric tuning techniques like GridSearchCV and RandomSearchCV. Most Importantly, In this article, we will demonstrate you to end to end implementation of Random forest regressor sklearn. Firstly you will package using the import statement. Secondly, We will create the object of the Random forest regressor.


Decision Trees, Random Forests & Gradient Boosting in R

#artificialintelligence

Would you like to build predictive models using machine learning? That s precisely what you will learn in this course "Decision Trees, Random Forests and Gradient Boosting in R." My name is Carlos Martínez, I have a Ph.D. in Management from the University of St. Gallen in Switzerland. I have presented my research at some of the most prestigious academic conferences and doctoral colloquiums at the University of Tel Aviv, Politecnico di Milano, University of Halmstad, and MIT. Furthermore, I have co-authored more than 25 teaching cases, some of them included in the case bases of Harvard and Michigan. This is a very comprehensive course that includes presentations, tutorials, and assignments. The course has a practical approach based on the learning-by-doing method in which you will learn decision trees and ensemble methods based on decision trees using a real dataset.


An artificial intelligence and Internet of things based automated irrigation system

arXiv.org Artificial Intelligence

It is not hard to see that the need for clean water is growing by considering the decrease of the water sources day by day in the world. Potable fresh water is also used for irrigation, so it should be planned to decrease freshwater wastage. With the development of technology and the availability of cheaper and more effective solutions, the efficiency of irrigation increased and the water loss can be reduced. In particular, Internet of things (IoT) devices has begun to be used in all areas. We can easily and precisely collect temperature, humidity and mineral values from the irrigation field with the IoT devices and sensors. Most of the operations and decisions about irrigation are carried out by people. For people, it is hard to have all the real-time data such as temperature, moisture and mineral levels in the decision-making process and make decisions by considering them. People usually make decisions with their experience. In this study, a wide range of information from the irrigation field was obtained by using IoT devices and sensors. Data collected from IoT devices and sensors sent via communication channels and stored on MongoDB. With the help of Weka software, the data was normalized and the normalized data was used as a learning set. As a result of the examinations, a decision tree (J48) algorithm with the highest accuracy was chosen and an artificial intelligence model was created. Decisions are used to manage operations such as starting, maintaining and stopping the irrigation. The accuracy of the decisions was evaluated and the irrigation system was tested with the results. There are options to manage, view the system remotely and manually and also see the system s decisions with the created mobile application.