AITopics

doi: 10.1007/978-3-031-23618-1_25

2207.01994

Country:

Europe > Greece > Central Macedonia > Thessaloniki (0.05)
Asia > Japan (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

arXiv.org Machine LearningJun-29-2022

Decision Forest Based EMG Signal Classification with Low Volume Dataset Augmented with Random Variance Gaussian Noise

Gunasar, Tekin, Rekesh, Alexandra, Nair, Atul, King, Penelope, Markova, Anastasiya, Zhang, Jiaqi, Tate, Isabel

Electromyography signals can be used as training data by machine learning models to classify various gestures. We seek to produce a model that can classify six different hand gestures with a limited number of samples that generalizes well to a wider audience while comparing the effect of our feature extraction results on model accuracy to other more conventional methods such as the use of AR parameters on a sliding window across the channels of a signal. We appeal to a set of more elementary methods such as the use of random bounds on a signal, but desire to show the power these methods can carry in an online setting where EMG classification is being conducted, as opposed to more complicated methods such as the use of the Fourier Transform. To augment our limited training data, we used a standard technique, known as jitter, where random noise is added to each observation in a channel wise manner. Once all datasets were produced using the above methods, we performed a grid search with Random Forest and XGBoost to ultimately create a high accuracy model. For human computer interface purposes, high accuracy classification of EMG signals is of particular importance to their functioning and given the difficulty and cost of amassing any sort of biomedical data in a high volume, it is valuable to have techniques that can work with a low amount of high-quality samples with less expensive feature extraction methods that can reliably be carried out in an online application.

artificial intelligence, data mining, machine learning, (16 more...)

arXiv.org Machine Learning

2206.14947

Country: North America > United States > California > San Diego County > San Diego (0.04)

Genre: Research Report (0.65)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.35)
Health & Medicine > Pharmaceuticals & Biotechnology (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.60)
Information Technology > Data Science > Data Mining > Feature Extraction (0.58)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.52)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

#artificialintelligenceJun-28-2022, 14:30:53 GMT

Does gridsearch on random forest make sense?

You are right that randomness will play a role (like with many other algorithms including MCMC samplers for Bayesian models, XGBoost, LightGBM, neural networks etc.) in the results. The obvious way to minimize randomness in the results of any hyper-parameter optimization method for RF (whether it's random grid-search, grid search or some Bayesian hyperparameter optimization method) is to increase the number of trees (which reduces the randomness in the model behavior - albeit at the cost of an increased training time). Alternatively, you construct a surrogate model on top of the results that takes into account that the signal, of where the best model in the hyperparameter landscape is, is noisy through an appropriate amount of smoothing/regularization.

gridsearch, random forest make sense, randomness, (1 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.78)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)

arXiv.org Machine LearningJun-24-2022

Quantifying Inherent Randomness in Machine Learning Algorithms

Raste, Soham, Singh, Rahul, Vaughan, Joel, Nair, Vijayan N.

Most machine learning (ML) algorithms have several stochastic elements, and their performances are affected by these sources of randomness. This paper uses an empirical study to systematically examine the effects of two sources: randomness in model training and randomness in the partitioning of a dataset into training and test subsets. We quantify and compare the magnitude of the variation in predictive performance for the following ML algorithms: Random Forests (RFs), Gradient Boosting Machines (GBMs), and Feedforward Neural Networks (FFNNs). Among the different algorithms, randomness in model training causes larger variation for FFNNs compared to tree-based methods. This is to be expected as FFNNs have more stochastic elements that are part of their model initialization and training. We also found that random splitting of datasets leads to higher variation compared to the inherent randomness from model training. The variation from data splitting can be a major issue if the original dataset has considerable heterogeneity. Keywords: Model Training, Reproducibility, Variation

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

2206.12353

Country: North America > United States > California (0.05)

Genre: Research Report (1.00)

Industry: Banking & Finance (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)

#artificialintelligenceJun-23-2022, 05:05:10 GMT

XGBoost: its present-day powers and use cases

Originally published on Towards AI the World's Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses. It's free, we don't spam, and we never share your email address.

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.47)

#artificialintelligenceJun-22-2022, 15:13:28 GMT

Retrain, or not Retrain? Online Machine Learning with Gradient Boosting

Training a machine learning model requires energy, time, and patience. Smart data scientists organize experiments and track trials on the historical data to deploy the best solution. Problems may arise when we pass newly available samples to our pre-build machine learning pipeline. In the case of predictive algorithms, the registered performances may diverge from the expected ones. The causes behind discrepancies are variegated.

algorithm, learning, stateful learning, (13 more...)

Genre: Instructional Material > Online (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.41)

Mohsen, Fadi, Karastoyanova, Dimka, Azzopardi, George

To remove or not remove Mobile Apps? A data-driven predictive model approach

arXiv.org Artificial IntelligenceJun-8-2022

Mobile app stores are the key distributors of mobile applications. They regularly apply vetting processes to the deployed apps. Yet, some of these vetting processes might be inadequate or applied late. The late removal of applications might have unpleasant consequences for developers and users alike. Thus, in this work we propose a data-driven predictive approach that determines whether the respective app will be removed or accepted. It also indicates the features' relevance that help the stakeholders in the interpretation. In turn, our approach can support developers in improving their apps and users in downloading the ones that are less likely to be removed. We focus on the Google App store and we compile a new data set of 870,515 applications, 56% of which have actually been removed from the market. Our proposed approach is a bootstrap aggregating of multiple XGBoost machine learning classifiers. We propose two models: user-centered using 47 features, and developer-centered using 37 features, the ones only available before deployment. We achieve the following Areas Under the ROC Curves (AUCs) on the test set: user-centered = 0.792, developer-centered = 0.762.

artificial intelligence, data-driven predictive model approach, machine learning, (1 more...)

doi: 10.1016/j.sasc.2022.200045

2206.03905

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.53)

#artificialintelligenceJun-3-2022, 05:15:16 GMT

XGBoost Alternative Base Learners

XGBoost, short for "Extreme Gradient Boosting," is one of the strongest machine learning algorithms for handling tabular data, a well-deserved reputation due to its success in winning numerous Kaggle competitions. XGBoost is an ensemble machine learning algorithm that usually consists of Decision Trees. The Decision Trees that make up XGBoost are individually referred to as gbtree, short for "gradient boosted tree." The first Decision Tree in the XGBoost ensemble is the base learner whose mistakes all subsequent trees learn from. Although Decision Trees are generally preferred as base learners due to their excellent ensemble scores, in some cases, alternative base learners may outperform them.

base learner, learner, random forest, (15 more...)

Country: North America > United States > California (0.06)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Salcedo-Sanz, Sancho, Pérez-Aracil, Jorge, Ascenso, Guido, Del Ser, Javier, Casillas-Pérez, David, Kadow, Christopher, Fister, Dusan, Barriopedro, David, García-Herrera, Ricardo, Restelli, Marcello, Giuliani, Mateo, Castelletti, Andrea

Analysis, Characterization, Prediction and Attribution of Extreme Atmospheric Events with Machine Learning: a Review

arXiv.org Artificial IntelligenceJun-3-2022

Atmospheric Extreme Events (EEs) cause severe damages to human societies and ecosystems. The frequency and intensity of EEs and other associated events are increasing in the current climate change and global warming risk. The accurate prediction, characterization, and attribution of atmospheric EEs is therefore a key research field, in which many groups are currently working by applying different methodologies and computational tools. Machine Learning (ML) methods have arisen in the last years as powerful techniques to tackle many of the problems related to atmospheric EEs. This paper reviews the ML algorithms applied to the analysis, characterization, prediction, and attribution of the most important atmospheric EEs. A summary of the most used ML techniques in this area, and a comprehensive critical review of literature related to ML in EEs, are provided. A number of examples is discussed and perspectives and outlooks on the field are drawn.

algorithm, ml algorithm, prediction, (15 more...)

2207.0758

Country:

Europe > France (0.05)
Europe > Spain > Galicia > Madrid (0.04)
Asia > Middle East > Iran (0.04)
(46 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.93)

Industry:

Transportation > Infrastructure & Services > Airport (1.00)
Transportation > Air (1.00)
Health & Medicine (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
(5 more...)

Hapfelmeier, Alexander, Hornung, Roman, Haller, Bernhard

Sequential Permutation Testing of Random Forest Variable Importance Measures

arXiv.org Artificial IntelligenceJun-2-2022

Hypothesis testing of random forest (RF) variable importance measures (VIMP) remains the subject of ongoing research. Among recent developments, heuristic approaches to parametric testing have been proposed whose distributional assumptions are based on empirical evidence. Other formal tests under regularity conditions were derived analytically. However, these approaches can be computationally expensive or even practically infeasible. This problem also occurs with non-parametric permutation tests, which are, however, distribution-free and can generically be applied to any type of RF and VIMP. Embracing this advantage, it is proposed here to use sequential permutation tests and sequential p-value estimation to reduce the high computational costs associated with conventional permutation tests. The popular and widely used permutation VIMP serves as a practical and relevant application example. The results of simulation studies confirm that the theoretical properties of the sequential tests apply, that is, the type-I error probability is controlled at a nominal level and a high power is maintained with considerably fewer permutations needed in comparison to conventional permutation testing. The numerical stability of the methods is investigated in two additional application studies. In summary, theoretically sound sequential permutation testing of VIMP is possible at greatly reduced computational costs. Recommendations for application are given. A respective implementation is provided through the accompanying R package $rfvimptest$. The approach can also be easily applied to any kind of prediction model.

artificial intelligence, decision tree learning, machine learning, (2 more...)

doi: 10.1016/j.csda.2022.107689

2206.01284

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.60)