Goto

Collaborating Authors

 Ensemble Learning


Cryptocurrency Price Forecasting Using XGBoost Regressor and Technical Indicators

arXiv.org Artificial Intelligence

--The rapid growth of the stock market has attracted many investors due to its potential for significant profits. However, predicting stock prices accurately is difficult because financial markets are complex and constantly changing. This is especially true for the cryptocurrency market, which is known for its extreme volatility, making it challenging for traders and investors to make wise and profitable decisions. This study introduces a machine learning approach to predict cryptocur-rency prices. Specifically, we make use of important technical indicators such as Exponential Moving A verage (EMA) and Moving A verage Convergence Divergence (MACD) to train and feed the XGBoost regressor model. We demonstrate our approach through an analysis focusing on the closing prices of Bitcoin cryptocurrency. We evaluate the model's performance through various simulations, showing promising results that suggest its usefulness in aiding/guiding cryptocurrency traders and investors in dynamic market conditions. Over the past few years, the rapid expansion of the stock market has made it an appealing option for investors seeking high returns and easy access.


PSO Fuzzy XGBoost Classifier Boosted with Neural Gas Features on EEG Signals in Emotion Recognition

arXiv.org Artificial Intelligence

Emotion recognition is the technology-driven process of identifying and categorizing human emotions from various data sources, such as facial expressions, voice patterns, body motion, and physiological signals, such as EEG. These physiological indicators, though rich in data, present challenges due to their complexity and variability, necessitating sophisticated feature selection and extraction methods. NGN, an unsupervised learning algorithm, effectively adapts to input spaces without predefined grid structures, improving feature extraction from physiological data. Furthermore, the incorporation of fuzzy logic enables the handling of fuzzy data by introducing reasoning that mimics human decision-making. The combination of PSO with XGBoost aids in optimizing model performance through efficient hyperparameter tuning and decision process optimization. This study explores the integration of Neural-Gas Network (NGN), XGBoost, Particle Swarm Optimization (PSO), and fuzzy logic to enhance emotion recognition using physiological signals. Our research addresses three critical questions concerning the improvement of XGBoost with PSO and fuzzy logic, NGN's effectiveness in feature selection, and the performance comparison of the PSO-fuzzy XGBoost classifier with standard benchmarks. Acquired results indicate that our methodologies enhance the accuracy of emotion recognition systems and outperform other feature selection techniques using the majority of classifiers, offering significant implications for both theoretical advancement and practical application in emotion recognition technology.


Gradient Boosting Reinforcement Learning

arXiv.org Artificial Intelligence

Neural networks (NN) achieve remarkable results in various tasks, but lack key characteristics: interpretability, support for categorical features, and lightweight implementations suitable for edge devices. While ongoing efforts aim to address these challenges, Gradient Boosting Trees (GBT) inherently meet these requirements. As a result, GBTs have become the go-to method for supervised learning tasks in many real-world applications and competitions. However, their application in online learning scenarios, notably in reinforcement learning (RL), has been limited. In this work, we bridge this gap by introducing Gradient-Boosting RL (GBRL), a framework that extends the advantages of GBT to the RL domain. Using the GBRL framework, we implement various actor-critic algorithms and compare their performance with their NN counterparts. Inspired by shared backbones in NN we introduce a tree-sharing approach for policy and value functions with distinct learning rates, enhancing learning efficiency over millions of interactions. GBRL achieves competitive performance across a diverse array of tasks, excelling in domains with structured or categorical features. Additionally, we present a high-performance, GPU-accelerated implementation that integrates seamlessly with widely-used RL libraries (available at https://github.com/NVlabs/gbrl). GBRL expands the toolkit for RL practitioners, demonstrating the viability and promise of GBT within the RL paradigm, particularly in domains characterized by structured or categorical features.


QC-Forest: a Classical-Quantum Algorithm to Provably Speedup Retraining of Random Forest

arXiv.org Artificial Intelligence

Random Forest (RF) is a popular tree-ensemble method for supervised learning, prized for its ease of use and flexibility. Online RF models require to account for new training data to maintain model accuracy. This is particularly important in applications where data is periodically and sequentially generated over time in data streams, such as auto-driving systems, and credit card payments. In this setting, performing periodic model retraining with the old and new data accumulated is beneficial as it fully captures possible drifts in the data distribution over time. However, this is unpractical with state-of-the-art classical algorithms for RF as they scale linearly with the accumulated number of samples. We propose QC-Forest, a classical-quantum algorithm designed to time-efficiently retrain RF models in the streaming setting for multi-class classification and regression, achieving a runtime poly-logarithmic in the total number of accumulated samples. QC-Forest leverages Des-q, a quantum algorithm for single tree construction and retraining proposed by Kumar et al. by expanding to multi-class classification, as the original proposal was limited to binary classes, and introducing an exact classical method to replace an underlying quantum subroutine incurring a finite error, while maintaining the same poly-logarithmic dependence. Finally, we showcase that QC-Forest achieves competitive accuracy in comparison to state-of-the-art RF methods on widely used benchmark datasets with up to 80,000 samples, while significantly speeding up the model retrain.


Integrating White and Black Box Techniques for Interpretable Machine Learning

arXiv.org Artificial Intelligence

In machine learning algorithm design, there exists a trade-off between the interpretability and performance of the algorithm. In general, algorithms which are simpler and easier for humans to comprehend tend to show worse performance than more complex, less transparent algorithms. For example, a random forest classifier is likely to be more accurate than a simple decision tree, but at the expense of interpretability. In this paper, we present an ensemble classifier design which classifies easier inputs using a highly-interpretable classifier (i.e., white box model), and more difficult inputs using a more powerful, but less interpretable classifier (i.e., black box model).


Advanced Meta-Ensemble Machine Learning Models for Early and Accurate Sepsis Prediction to Improve Patient Outcomes

arXiv.org Artificial Intelligence

Sepsis, a critical condition from the body's response to infection, poses a major global health crisis affecting all age groups. Timely detection and intervention are crucial for reducing healthcare expenses and improving patient outcomes. This paper examines the limitations of traditional sepsis screening tools like Systemic Inflammatory Response Syndrome, Modified Early Warning Score, and Quick Sequential Organ Failure Assessment, highlighting the need for advanced approaches. We propose using machine learning techniques - Random Forest, Extreme Gradient Boosting, and Decision Tree models - to predict sepsis onset. Our study evaluates these models individually and in a combined meta-ensemble approach using key metrics such as Accuracy, Precision, Recall, F1 score, and Area Under the Receiver Operating Characteristic Curve. Results show that the meta-ensemble model outperforms individual models, achieving an AUC-ROC score of 0.96, indicating superior predictive accuracy for early sepsis detection. The Random Forest model also performs well with an AUC-ROC score of 0.95, while Extreme Gradient Boosting and Decision Tree models score 0.94 and 0.90, respectively.


A Comprehensive Analysis of Machine Learning Models for Algorithmic Trading of Bitcoin

arXiv.org Artificial Intelligence

This study evaluates the performance of 41 machine learning models, including 21 classifiers and 20 regressors, in predicting Bitcoin prices for algorithmic trading. By examining these models under various market conditions, we highlight their accuracy, robustness, and adaptability to the volatile cryptocurrency market. Our comprehensive analysis reveals the strengths and limitations of each model, providing critical insights for developing effective trading strategies. We employ both machine learning metrics (e.g., Mean Absolute Error, Root Mean Squared Error) and trading metrics (e.g., Profit and Loss percentage, Sharpe Ratio) to assess model performance. Our evaluation includes backtesting on historical data, forward testing on recent unseen data, and real-world trading scenarios, ensuring the robustness and practical applicability of our models. Key findings demonstrate that certain models, such as Random Forest and Stochastic Gradient Descent, outperform others in terms of profit and risk management. These insights offer valuable guidance for traders and researchers aiming to leverage machine learning for cryptocurrency trading.


Fault Detection for agents on power grid topology optimization: A Comprehensive analysis

arXiv.org Artificial Intelligence

The topology optimization of transmission networks using Deep Reinforcement Learning (DRL) has increasingly come into focus. Various researchers have proposed different DRL agents, which are often benchmarked on the Grid2Op environment from the Learning to Run a Power Network (L2RPN) challenges. The environments have many advantages with their realistic chronics and underlying power flow backends. However, the interpretation of agent survival or failure is not always clear, as there are a variety of potential causes. In this work, we focus on the failures of the power grid to identify patterns and detect them a priori. We collect the failed chronics of three different agents on the WCCI 2022 L2RPN environment, totaling about 40k data points. By clustering, we are able to detect five distinct clusters, identifying different failure types. Further, we propose a multi-class prediction approach to detect failures beforehand and evaluate five different models. Here, the Light Gradient-Boosting Machine (LightGBM) shows the best performance, with an accuracy of 86%. It also correctly identifies in 91% of the time failure and survival observations. Finally, we provide a detailed feature importance analysis that identifies critical features and regions in the grid.


Unmasking Trees for Tabular Data

arXiv.org Machine Learning

We herein describe UnmaskingTrees, a method and open-source software package for tabular data generation and, especially, imputation. Our experiments suggest that training gradient-boosted trees to incrementally unmask features offers a simple, strong baseline for imputation.


BrainMetDetect: Predicting Primary Tumor from Brain Metastasis MRI Data Using Radiomic Features and Machine Learning Algorithms

arXiv.org Artificial Intelligence

Objective: Brain metastases (BMs) are common in cancer patients and determining the primary tumor site is crucial for effective treatment. This study aims to predict the primary tumor site from BM MRI data using radiomic features and advanced machine learning algorithms. Methods: We utilized a comprehensive dataset from Ocana-Tienda et al. (2023) comprising MRI and clinical data from 75 patients with BMs. Radiomic features were extracted from post-contrast T1-weighted MRI sequences. Feature selection was performed using the GINI index, and data normalization was applied to ensure consistent scaling. We developed and evaluated Random Forest and XGBoost classifiers, both with and without hyperparameter optimization using the FOX (Fox optimizer) algorithm. Model interpretability was enhanced using SHAP (SHapley Additive exPlanations) values. Results: The baseline Random Forest model achieved an accuracy of 0.85, which improved to 0.93 with FOX optimization. The XGBoost model showed an initial accuracy of 0.96, increasing to 0.99 after optimization. SHAP analysis revealed the most influential radiomic features contributing to the models' predictions. The FOX-optimized XGBoost model exhibited the best performance with a precision, recall, and F1-score of 0.99. Conclusion: This study demonstrates the effectiveness of using radiomic features and machine learning to predict primary tumor sites from BM MRI data. The FOX optimization algorithm significantly enhanced model performance, and SHAP provided valuable insights into feature importance. These findings highlight the potential of integrating radiomics and machine learning into clinical practice for improved diagnostic accuracy and personalized treatment planning.