Goto

Collaborating Authors

 Decision Tree Learning


Martian Ionosphere Electron Density Prediction Using Bagged Trees

arXiv.org Artificial Intelligence

The availability of Martian atmospheric data provided by several Martian missions broadened the opportunity to investigate and study the conditions of the Martian ionosphere. As such, ionospheric models play a crucial part in improving our understanding of ionospheric behavior in response to different spatial, temporal, and space weather conditions. This work represents an initial attempt to construct an electron density prediction model of the Martian ionosphere using machine learning. The model targets the ionosphere at solar zenith ranging from 70 to 90 degrees, and as such only utilizes observations from the Mars Global Surveyor mission. The performance of different machine learning methods was compared in terms of root mean square error, coefficient of determination, and mean absolute error. The bagged regression trees method performed best out of all the evaluated methods. Furthermore, the optimized bagged regression trees model outperformed other Martian ionosphere models from the literature (MIRI and NeMars) in finding the peak electron density value, and the peak density height in terms of root-mean-square error and mean absolute error.


The Improvement of Decision Tree Construction Algorithm Based On Quantum Heuristic Algorithms

arXiv.org Artificial Intelligence

This work is related to the implementation of a decision tree construction algorithm on a quantum simulator. Here we consider an algorithm based on a binary criterion. Also, we study the improvement capability with quantum heuristic QAOA. We implemented the classical and the quantum version of this algorithm to compare built trees.


Crop mapping in the small sample/no sample case: an approach using a two-level cascade classifier and integrating domain knowledge

arXiv.org Artificial Intelligence

Mapping crops using remote sensing technology is important for food security and land management. Machine learning-based methods has become a popular approach for crop mapping in recent years. However, the key to machine learning, acquiring ample and accurate samples, is usually time-consuming and laborious. To solve this problem, a crop mapping method in the small sample/no sample case that integrating domain knowledge and using a cascaded classification framework that combine a weak classifier learned from samples with strong features and a strong classifier trained by samples with weak feature was proposed. First, based on the domain knowledge of various crops, a low-capacity classifier such as decision tree was applied to acquire those pixels with distinctive features and complete observation sequences as "strong feature" samples. Then, to improve the representativeness of these samples, sample augmentation strategy that artificially remove the observations of "strong feature" samples according to the average valid observation proportion in target area was applied. Finally, based on the original samples and augmented samples, a large-capacity classifier such as random forest was trained for crop mapping. The method achieved an overall accuracy of 82% in the MAP crop recognition competition held by Syngenta Group, China in 2021 (third prize, ranked fourth). This method integrates domain knowledge to overcome the difficulties of sample acquisition, providing a convenient, fast and accurate solution for crop mapping.


The State of the Art in Enhancing Trust in Machine Learning Models with the Use of Visualizations

arXiv.org Artificial Intelligence

Machine learning (ML) models are nowadays used in complex applications in various domains, such as medicine, bioinformatics, and other sciences. Due to their black box nature, however, it may sometimes be hard to understand and trust the results they provide. This has increased the demand for reliable visualization tools related to enhancing trust in ML models, which has become a prominent topic of research in the visualization community over the past decades. To provide an overview and present the frontiers of current research on the topic, we present a State-of-the-Art Report (STAR) on enhancing trust in ML models with the use of interactive visualization. We define and describe the background of the topic, introduce a categorization for visualization techniques that aim to accomplish this goal, and discuss insights and opportunities for future research directions. Among our contributions is a categorization of trust against different facets of interactive ML, expanded and improved from previous research. Our results are investigated from different analytical perspectives: (a) providing a statistical overview, (b) summarizing key findings, (c) performing topic analyses, and (d) exploring the data sets used in the individual papers, all with the support of an interactive web-based survey browser. We intend this survey to be beneficial for visualization researchers whose interests involve making ML models more trustworthy, as well as researchers and practitioners from other disciplines in their search for effective visualization techniques suitable for solving their tasks with confidence and conveying meaning to their data.


Benchmarking Machine Learning Models to Predict Corporate Bankruptcy

arXiv.org Artificial Intelligence

The risk of bankruptcy in a publicly traded firm is of major interest to shareholders, creditors, and employees. Prior literature has investigated the predictive performance of different forecasting models, mainly the discriminant analysis with accounting information (Altman, 1968), the distance to default structural model (Bharath and Shumway, 2008), and the hazard model with accounting and market information (Shumway, 2001; Chava and Jarrow, 2004). In this paper we investigate the benefits of applying high dimensional machine learning (ML) methods to bankruptcy prediction. We use a comprehensive sample of bankruptcies for U.S. publicly traded companies from 1969 to 2019 with financial, market, macro, and text based predictors. We study the performance of eight ML algorithms: the hazard model of Shumway (2001) and Chava and Jarrow (2004) enhanced with a penalty function (LASSO and Ridge), bagged trees (random forest and survival random forest), gradient boosted trees (XG Boost and LightGBM), and two specifications of neural networks (one shallower and one deeper).


Machine Learning with Probabilistic Law Discovery: A Concise Introduction

arXiv.org Artificial Intelligence

Probabilistic Law Discovery (PLD) is a logic based Machine Learning method, which implements a variant of probabilistic rule learning. In several aspects, PLD is close to Decision Tree/Random Forest methods, but it differs significantly in how relevant rules are defined. The learning procedure of PLD solves the optimization problem related to the search for rules (called probabilistic laws), which have a minimal length and relatively high probability. At inference, ensembles of these rules are used for prediction. Probabilistic laws are human-readable and PLD based models are transparent and inherently interpretable. Applications of PLD include classification/clusterization/regression tasks, as well as time series analysis/anomaly detection and adaptive (robotic) control. In this paper, we outline the main principles of PLD, highlight its benefits and limitations and provide some application guidelines.


Annual field-scale maps of tall and short crops at the global scale using GEDI and Sentinel-2

arXiv.org Artificial Intelligence

Crop type maps are critical for tracking agricultural land use and estimating crop production. Remote sensing has proven an efficient and reliable tool for creating these maps in regions with abundant ground labels for model training, yet these labels remain difficult to obtain in many regions and years. NASA's Global Ecosystem Dynamics Investigation (GEDI) spaceborne lidar instrument, originally designed for forest monitoring, has shown promise for distinguishing tall and short crops. In the current study, we leverage GEDI to develop wall-to-wall maps of short vs tall crops on a global scale at 10 m resolution for 2019-2021. Specifically, we show that (1) GEDI returns can reliably be classified into tall and short crops after removing shots with extreme view angles or topographic slope, (2) the frequency of tall crops over time can be used to identify months when tall crops are at their peak height, and (3) GEDI shots in these months can then be used to train random forest models that use Sentinel-2 time series to accurately predict short vs. tall crops. Independent reference data from around the world are then used to evaluate these GEDI-S2 maps. We find that GEDI-S2 performed nearly as well as models trained on thousands of local reference training points, with accuracies of at least 87% and often above 90% throughout the Americas, Europe, and East Asia. Systematic underestimation of tall crop area was observed in regions where crops frequently exhibit low biomass, namely Africa and South Asia, and further work is needed in these systems. Although the GEDI-S2 approach only differentiates tall from short crops, in many landscapes this distinction goes a long way toward mapping the main individual crop types. The combination of GEDI and Sentinel-2 thus presents a very promising path towards global crop mapping with minimal reliance on ground data.



Forgetful Forests: high performance learning data structures for streaming data under concept drift

arXiv.org Artificial Intelligence

Database research can help machine learning performance in many ways. One way is to design better data structures. This paper combines the use of incremental computation and sequential and probabilistic filtering to enable "forgetful" tree-based learning algorithms to cope with concept drift data (i.e., data whose function from input to classification changes over time). The forgetful algorithms described in this paper achieve high time performance while maintaining high quality predictions on streaming data. Specifically, the algorithms are up to 24 times faster than state-of-the-art incremental algorithms with at most a 2% loss of accuracy, or at least twice faster without any loss of accuracy. This makes such structures suitable for high volume streaming applications.


Become a decision tree expert and elevate your Machine Learning skills

#artificialintelligence

A decision tree is a type of machine-learning algorithm that is used for classification and regression tasks. To learn how to use decision trees, you can start by understanding the basic concepts and principles behind them. I'm mentioning one of the playlists in this article where you can embrace the power of decision trees and learn them in a single, focused session. That's the wrap for today, I hope you find this article useful. Stay tuned for the next insightful article.