Goto

Collaborating Authors

xgboost


XGBoost: its present-day powers and use cases

#artificialintelligence

Originally published on Towards AI the World's Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses. It's free, we don't spam, and we never share your email address.


XGBoost Alternative Base Learners

#artificialintelligence

XGBoost, short for "Extreme Gradient Boosting," is one of the strongest machine learning algorithms for handling tabular data, a well-deserved reputation due to its success in winning numerous Kaggle competitions. XGBoost is an ensemble machine learning algorithm that usually consists of Decision Trees. The Decision Trees that make up XGBoost are individually referred to as gbtree, short for "gradient boosted tree." The first Decision Tree in the XGBoost ensemble is the base learner whose mistakes all subsequent trees learn from. Although Decision Trees are generally preferred as base learners due to their excellent ensemble scores, in some cases, alternative base learners may outperform them.


Can AI Identify Patients With Long COVID?

#artificialintelligence

Long COVID refers to the condition where people experience long-term effects from their infection with the SARS CoV-2 virus that is responsible for the COVID-19 disease (Coronavirus disease 2019) pandemic according to the U.S. Centers for Disease Control and Prevention (CDC). A new study published in The Lancet Digital Health applies artificial intelligence (AI) machine learning to identify patients with long COVID-19 using data from electronic health records with high accuracy. "Patients identified by our models as potentially having long COVID can be interpreted as patients warranting care at a specialty clinic for long COVID, which is an essential proxy for long COVID diagnosis as its definition continues to evolve," the researchers concluded. "We also achieve the urgent goal of identifying potential long COVID in patients for clinical trials." Globally there have been over 510 million confirmed cases of COVID-19 and more than 6.2 million deaths according to April 2022 statistics from Johns Hopkins University.


Machine learning will be one of the best ways to identify habitable exoplanets

#artificialintelligence

The field of extrasolar planet studies is undergoing a seismic shift. To date, 4,940 exoplanets have been confirmed in 3,711 planetary systems, with another 8,709 candidates awaiting confirmation. With so many planets available for study and improvements in telescope sensitivity and data analysis, the focus is transitioning from discovery to characterization. Instead of simply looking for more planets, astrobiologists will examine "potentially-habitable" worlds for potential "biosignatures." This refers to the chemical signatures associated with life and biological processes, one of the most important of which is water.


Machine Learning Will be one of the Best Ways to Identify Habitable Exoplanets - Universe Today

#artificialintelligence

The field of extrasolar planet studies is undergoing a seismic shift. To date, 4,940 exoplanets have been confirmed in 3,711 planetary systems, with another 8,709 candidates awaiting confirmation. With so many planets available for study and improvements in telescope sensitivity and data analysis, the focus is transitioning from discovery to characterization. Instead of simply looking for more planets, astrobiologists will examine "potentially-habitable" worlds for potential "biosignatures." This refers to the chemical signatures associated with life and biological processes, one of the most important of which is water. As the only known solvent that life (as we know it) cannot exist, water is considered the divining rod for finding life.


No Brainer AutoML with AutoXGB - KDnuggets

#artificialintelligence

Automated machine learning (AutoML) runs various machine learning processes automatically and optimizes error metrics to generate the best possible model. These processes include: data preprocessing, encoding, scaling, optimizing hyperparameters, model training, generating artifacts, and a list of results. Automating the machine learning process makes it fast to develop AI solutions, provide a user-friendly experience, and often produce accurate results with low code - TechTarget. In this tutorial, we are going to use 1994 census income data to predict whether a person makes over $50K a year or not. This is a classic binary classification problem and we are going to use the Kaggle dataset Adult Census Income under CC0: Public Domain license.



A Fair and Efficient Hybrid Federated Learning Framework based on XGBoost for Distributed Power Prediction

arXiv.org Artificial Intelligence

In a modern power system, real-time data on power generation/consumption and its relevant features are stored in various distributed parties, including household meters, transformer stations and external organizations. To fully exploit the underlying patterns of these distributed data for accurate power prediction, federated learning is needed as a collaborative but privacy-preserving training scheme. However, current federated learning frameworks are polarized towards addressing either the horizontal or vertical separation of data, and tend to overlook the case where both are present. Furthermore, in mainstream horizontal federated learning frameworks, only artificial neural networks are employed to learn the data patterns, which are considered less accurate and interpretable compared to tree-based models on tabular datasets. To this end, we propose a hybrid federated learning framework based on XGBoost, for distributed power prediction from real-time external features. In addition to introducing boosted trees to improve accuracy and interpretability, we combine horizontal and vertical federated learning, to address the scenario where features are scattered in local heterogeneous parties and samples are scattered in various local districts. Moreover, we design a dynamic task allocation scheme such that each party gets a fair share of information, and the computing power of each party can be fully leveraged to boost training efficiency. A follow-up case study is presented to justify the necessity of adopting the proposed framework. The advantages of the proposed framework in fairness, efficiency and accuracy performance are also confirmed.


Application of Machine Learning Methods in Inferring Surface Water Groundwater Exchanges using High Temporal Resolution Temperature Measurements

arXiv.org Machine Learning

We examine the ability of machine learning (ML) and deep learning (DL) algorithms to infer surface/ground exchange flux based on subsurface temperature observations. The observations and fluxes are produced from a high-resolution numerical model representing conditions in the Columbia River near the Department of Energy Hanford site located in southeastern Washington State. Random measurement error, of varying magnitude, is added to the synthetic temperature observations. The results indicate that both ML and DL methods can be used to infer the surface/ground exchange flux. DL methods, especially convolutional neural networks, outperform the ML methods when used to interpret noisy temperature data with a smoothing filter applied. However, the ML methods also performed well and they are can better identify a reduced number of important observations, which could be useful for measurement network optimization. Surprisingly, the ML and DL methods better inferred upward flux than downward flux. This is in direct contrast to previous findings using numerical models to infer flux from temperature observations and it may suggest that combined use of ML or DL inference with numerical inference could improve flux estimation beneath river systems.


Use deep learning frameworks natively in Amazon SageMaker Processing

#artificialintelligence

Until recently, customers who wanted to use a deep learning (DL) framework with Amazon SageMaker Processing faced increased complexity compared to those using scikit-learn or Apache Spark. This post shows you how SageMaker Processing has simplified running machine learning (ML) preprocessing and postprocessing tasks with popular frameworks such as PyTorch, TensorFlow, Hugging Face, MXNet, and XGBoost. Training an ML model takes many steps. One of them, data preparation, is paramount to creating an accurate ML model. Likewise, you often need to run postprocessing jobs (for example, filtering or collating) and model evaluation jobs (scoring models against different test sets) as part of your ML model development lifecycle.