Ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. (Wikipedia)
Boosting is an ensemble technique that attempts to create strong classifiers from a number of weak classifiers. Unlike many machine learning models which focus on high quality prediction done using single model, boosting algorithms seek to improve the prediction power by training a sequence of weak models, each compensating the weaknesses of its predecessors. Boosting grants power to machine learning models to improve their accuracy of prediction. AdaBoost, short for Adaptive Boosting, is a machine learning algorithm formulated by Yoav Freund and Robert Schapire. AdaBoost technique follows a decision tree model with a depth equal to one.
As we make tremendous advances in machine learning and artificial intelligence technosciences, there is a renewed understanding in the AI community that we must ensure that humans being are at the center of our deliberations so that we don't end in technology-induced dystopias. As strongly argued by Green in his book Smart Enough City, the incorporation of technology in city environs does not automatically translate into prosperity, wellbeing, urban livability, or social justice. There is a great need to deliberate on the future of the cities worth living and designing. There are philosophical and ethical questions involved along with various challenges that relate to the security, safety, and interpretability of AI algorithms that will form the technological bedrock of future cities. Several research institutes on human centered AI have been established at top international universities. Globally there are calls for technology to be made more humane and human-compatible. For example, Stuart Russell has a book called Human Compatible AI. The Center for Humane Technology advocates for regulators and technology companies to avoid business models and product features that contribute to social problems such as extremism, polarization, misinformation, and Internet addiction. In this paper, we analyze and explore key challenges including security, robustness, interpretability, and ethical challenges to a successful deployment of AI or ML in human-centric applications, with a particular emphasis on the convergence of these challenges. We provide a detailed review of existing literature on these key challenges and analyze how one of these challenges may lead to others or help in solving other challenges. The paper also advises on the current limitations, pitfalls, and future directions of research in these domains, and how it can fill the current gaps and lead to better solutions.
XGBoost stands for "Extreme Gradient Boosting". XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable. It implements Machine Learning algorithms under the Gradient Boosting framework. It provides a parallel tree boosting to solve many data science problems in a fast and accurate way. Boosting is an ensemble learning technique to build a strong classifier from several weak classifiers in series. Boosting algorithms play a crucial role in dealing with bias-variance trade-off.
There are lots of articles out there talking about XGBoost and using it for models. And why shouldn't there be? It is a really powerful tool that has been proven to obtain great results in a wide variety of environments, favoring heterogeneous data. It has implementations in several languages, but in this article we are going to follow the trend of the previous ones and see the Python 3 implementation. The fancy name of the library comes from the algorithm used in it to train the model, but how does it work? Let's go backwards seeing what each word means.
A growing attention in the empirical literature has been paid to the incidence of climate shocks and change in migration decisions. Previous literature leads to different results and uses a multitude of traditional empirical approaches. This paper proposes a tree-based Machine Learning (ML) approach to analyze the role of the weather shocks towards an individual's intention to migrate in the six agriculture-dependent-economy countries such as Burkina Faso, Ivory Coast, Mali, Mauritania, Niger, and Senegal. We perform several tree-based algorithms (e.g., XGB, Random Forest) using the train-validation-test workflow to build robust and noise-resistant approaches. Then we determine the important features showing in which direction they are influencing the migration intention. This ML-based estimation accounts for features such as weather shocks captured by the Standardized Precipitation-Evapotranspiration Index (SPEI) for different timescales and various socioeconomic features/covariates. We find that (i) weather features improve the prediction performance although socioeconomic characteristics have more influence on migration intentions, (ii) country-specific model is necessary, and (iii) international move is influenced more by the longer timescales of SPEIs while general move (which includes internal move) by that of shorter timescales.
During the training phase of machine learning (ML) models, it is usually necessary to configure several hyperparameters. This process is computationally intensive and requires an extensive search to infer the best hyperparameter set for the given problem. The challenge is exacerbated by the fact that most ML models are complex internally, and training involves trial-and-error processes that could remarkably affect the predictive result. Moreover, each hyperparameter of an ML algorithm is potentially intertwined with the others, and changing it might result in unforeseeable impacts on the remaining hyperparameters. Evolutionary optimization is a promising method to try and address those issues. According to this method, performant models are stored, while the remainder are improved through crossover and mutation processes inspired by genetic algorithms. We present VisEvol, a visual analytics tool that supports interactive exploration of hyperparameters and intervention in this evolutionary procedure. In summary, our proposed tool helps the user to generate new models through evolution and eventually explore powerful hyperparameter combinations in diverse regions of the extensive hyperparameter space. The outcome is a voting ensemble (with equal rights) that boosts the final predictive performance. The utility and applicability of VisEvol are demonstrated with two use cases and interviews with ML experts who evaluated the effectiveness of the tool.
Light Gradient Boosted Machine, or LightGBM for short, is an efficient and effective implementation of the gradient boosting algorithm. LightGBM extends the gradient boosting algorithm by adding a type of automatic feature selection as well as focusing on boosting examples with larger gradients. This can result in a dramatic speedup of training and improved predictive performance. As such, LightGBM has become a de facto algorithm for machine learning competitions when working with tabular data for regression and classification predictive modeling tasks. As such, it owns a share of the blame for the increased popularity and wider adoption of gradient boosting methods in general, along with Extreme Gradient Boosting (XGBoost).
Fraud detection is the most important step for a risk management process to prevent a recurrence. High volumes of fraud can be damaging revenue and reputation. Fortunately, it is possible to deal with fraud before it happens. Therefore, I would like to investigate the performance of the machine learning algorithms on a credit card fraud data set. The dataset contains transactions made by credit cards in September 2013 by European cardholders.
Suppose we wish to perform supervised learning on a classification problem to determine if an incoming email is spam or not spam. The spam dataset consists of 4601 emails, each labelled as real (or not spam) (0) or spam (1). The data also contains a large number of predictors (57), each of which is either a character count, or a frequency of occurrence of a certain word or symbol. In this short article, we will briefly cover the main concepts in tree based classification and compare and contrast the most popular methods. This dataset and several worked examples are covered in detail in The Elements of Statistical Learning, II edition.
This paper presents an eXplainable Fault Detection and Diagnosis System (XFDDS) for incipient faults in PV panels. The XFDDS is a hybrid approach that combines the model-based and data-driven framework. Model-based FDD for PV panels lacks high fidelity models at low irradiance conditions for detecting incipient faults. To overcome this, a novel irradiance based three diode model (IB3DM) is proposed. It is a nine parameter model that provides higher accuracy even at low irradiance conditions, an important aspect for detecting incipient faults from noise. To exploit PV data, extreme gradient boosting (XGBoost) is used due to its ability to detecting incipient faults. Lack of explainability, feature variability for sample instances, and false alarms are challenges with data-driven FDD methods. These shortcomings are overcome by hybridization of XGBoost and IB3DM, and using eXplainable Artificial Intelligence (XAI) techniques. To combine the XGBoost and IB3DM, a fault-signature metric is proposed that helps reducing false alarms and also trigger an explanation on detecting incipient faults. To provide explainability, an eXplainable Artificial Intelligence (XAI) application is developed. It uses the local interpretable model-agnostic explanations (LIME) framework and provides explanations on classifier outputs for data instances. These explanations help field engineers/technicians for performing troubleshooting and maintenance operations. The proposed XFDDS is illustrated using experiments on different PV technologies and our results demonstrate the perceived benefits.