Collaborating Authors


Forecasting: theory and practice Machine Learning

Forecasting has always been at the forefront of decision making and planning. The uncertainty that surrounds the future is both exciting and challenging, with individuals and organisations seeking to minimise risks and maximise utilities. The large number of forecasting applications calls for a diverse set of forecasting methods to tackle real-life challenges. This article provides a non-systematic review of the theory and the practice of forecasting. We provide an overview of a wide range of theoretical, state-of-the-art models, methods, principles, and approaches to prepare, produce, organise, and evaluate forecasts. We then demonstrate how such theoretical concepts are applied in a variety of real-life contexts. We do not claim that this review is an exhaustive list of methods and applications. However, we wish that our encyclopedic presentation will offer a point of reference for the rich work that has been undertaken over the last decades, with some key insights for the future of forecasting theory and practice. Given its encyclopedic nature, the intended mode of reading is non-linear. We offer cross-references to allow the readers to navigate through the various topics. We complement the theoretical concepts and applications covered by large lists of free or open-source software implementations and publicly-available databases.

Artificial Intellgence -- Application in Life Sciences and Beyond. The Upper Rhine Artificial Intelligence Symposium UR-AI 2021 Artificial Intelligence

The TriRhenaTech alliance presents the accepted papers of the 'Upper-Rhine Artificial Intelligence Symposium' held on October 27th 2021 in Kaiserslautern, Germany. Topics of the conference are applications of Artificial Intellgence in life sciences, intelligent systems, industry 4.0, mobility and others. The TriRhenaTech alliance is a network of universities in the Upper-Rhine Trinational Metropolitan Region comprising of the German universities of applied sciences in Furtwangen, Kaiserslautern, Karlsruhe, Offenburg and Trier, the Baden-Wuerttemberg Cooperative State University Loerrach, the French university network Alsace Tech (comprised of 14 'grandes \'ecoles' in the fields of engineering, architecture and management) and the University of Applied Sciences and Arts Northwestern Switzerland. The alliance's common goal is to reinforce the transfer of knowledge, research, and technology, as well as the cross-border mobility of students.

DP-XGBoost: Private Machine Learning at Scale Artificial Intelligence

The big-data revolution announced ten years ago does not seem to have fully happened at the expected scale. One of the main obstacle to this, has been the lack of data circulation. And one of the many reasons people and organizations did not share as much as expected is the privacy risk associated with data sharing operations. There has been many works on practical systems to compute statistical queries with Differential Privacy (DP). There have also been practical implementations of systems to train Neural Networks with DP, but relatively little efforts have been dedicated to designing scalable classical Machine Learning (ML) models providing DP guarantees. In this work we describe and implement a DP fork of a battle tested ML model: XGBoost. Our approach beats by a large margin previous attempts at the task, in terms of accuracy achieved for a given privacy budget. It is also the only DP implementation of boosted trees that scales to big data and can run in distributed environments such as: Kubernetes, Dask or Apache Spark.

Machine Learning on R 2021


There are people who are eager to move to Analytics careers but do not have the requisite skill sets. As we move into our 12th year in the Analytics Industry, OrangeTree Global has designed specific courses for freshers and working professionals who are looking at moving to Data Science, Machine Learning and Big Data Careers. Since 2009, OrangeTree Global has embarked on an ambitious vision of providing affordable and effective Analytics Training and Education across the country. OrangeTree Global has over a decade's experience in upskilling professionals and helping them move to analytics jobs and careers within and outside India. If you are reading this, we hope to be a part of your journey too.The program builds a solid foundation by covering the most popular and widely used machine learning technologies and its applications, including Naive Bayes theory and application, K Nearest Neighbors (KNN) theory and application, Random forest theory and application, Gradient Boosting Theory and Application and also Support Vector Machine Theory and Application–laying the building blocks for truly expanded analytical abilities.

Generating Explainable Rule Sets from Tree-Ensemble Learning Methods by Answer Set Programming Artificial Intelligence

Interpretability in machine learning is the ability to explain or to present in understandable terms to a human [8]. Interpretability is particularly important when, for example the goal of the user is to gain knowledge from some form of explanations about the data or process through machine learning models, or when making high-stakes decisions based on the outputs from the machine learning models where the user has to be able to trust the models. In this work we address the problem of explaining and understanding tree-ensemble learners by extracting meaningful rules from them. This problem is of practical relevance in business domains where the understanding of the behavior of high-performing machine learning models and extraction of knowledge in human readable form can aid users in the decision making process. We use Answer Set Programming (ASP) [14, 22] to generate rule sets from tree-ensembles.

PIVODL: Privacy-preserving vertical federated learning over distributed labels Artificial Intelligence

Federated learning (FL) is an emerging privacy preserving machine learning protocol that allows multiple devices to collaboratively train a shared global model without revealing their private local data. Non-parametric models like gradient boosting decision trees (GBDT) have been commonly used in FL for vertically partitioned data. However, all these studies assume that all the data labels are stored on only one client, which may be unrealistic for real-world applications. Therefore, in this work, we propose a secure vertical FL framework, named PIVODL, to train GBDT with data labels distributed on multiple devices. Both homomorphic encryption and differential privacy are adopted to prevent label information from being leaked through transmitted gradients and leaf values. Our experimental results show that both information leakage and model performance degradation of the proposed PIVODL are negligible.

Efficacy of Statistical and Artificial Intelligence-based False Information Cyberattack Detection Models for Connected Vehicles Artificial Intelligence

Connected vehicles (CVs), because of the external connectivity with other CVs and connected infrastructure, are vulnerable to cyberattacks that can instantly compromise the safety of the vehicle itself and other connected vehicles and roadway infrastructure. One such cyberattack is the false information attack, where an external attacker injects inaccurate information into the connected vehicles and eventually can cause catastrophic consequences by compromising safety-critical applications like the forward collision warning. The occurrence and target of such attack events can be very dynamic, making real-time and near-real-time detection challenging. Change point models, can be used for real-time anomaly detection caused by the false information attack. In this paper, we have evaluated three change point-based statistical models; Expectation Maximization, Cumulative Summation, and Bayesian Online Change Point Algorithms for cyberattack detection in the CV data. Also, data-driven artificial intelligence (AI) models, which can be used to detect known and unknown underlying patterns in the dataset, have the potential of detecting a real-time anomaly in the CV data. We have used six AI models to detect false information attacks and compared the performance for detecting the attacks with our developed change point models. Our study shows that change points models performed better in real-time false information attack detection compared to the performance of the AI models. Change point models having the advantage of no training requirements can be a feasible and computationally efficient alternative to AI models for false information attack detection in connected vehicles.

Hyperparameter Optimization: Foundations, Algorithms, Best Practices and Open Challenges Machine Learning

Most machine learning algorithms are configured by one or several hyperparameters that must be carefully chosen and often considerably impact performance. To avoid a time consuming and unreproducible manual trial-and-error process to find well-performing hyperparameter configurations, various automatic hyperparameter optimization (HPO) methods, e.g., based on resampling error estimation for supervised machine learning, can be employed. After introducing HPO from a general perspective, this paper reviews important HPO methods such as grid or random search, evolutionary algorithms, Bayesian optimization, Hyperband and racing. It gives practical recommendations regarding important choices to be made when conducting HPO, including the HPO algorithms themselves, performance evaluation, how to combine HPO with ML pipelines, runtime improvements, and parallelization.

A Comprehensive Guide to Ensemble Learning - What Exactly Do You Need to Know -


Ensemble learning techniques have been proven to yield better performance on machine learning problems. We can use these techniques for regression as well as classification problems. The final prediction from these ensembling techniques is obtained by combining results from several base models. Averaging, voting and stacking are some of the ways the results are combined to obtain a final prediction. In this article, we will explore how ensemble learning can be used to come up with optimal machine learning models. Ensemble learning is a combination of several machine learning models in one problem.

GBHT: Gradient Boosting Histogram Transform for Density Estimation Machine Learning

In this paper, we propose a density estimation algorithm called \textit{Gradient Boosting Histogram Transform} (GBHT), where we adopt the \textit{Negative Log Likelihood} as the loss function to make the boosting procedure available for the unsupervised tasks. From a learning theory viewpoint, we first prove fast convergence rates for GBHT with the smoothness assumption that the underlying density function lies in the space $C^{0,\alpha}$. Then when the target density function lies in spaces $C^{1,\alpha}$, we present an upper bound for GBHT which is smaller than the lower bound of its corresponding base learner, in the sense of convergence rates. To the best of our knowledge, we make the first attempt to theoretically explain why boosting can enhance the performance of its base learners for density estimation problems. In experiments, we not only conduct performance comparisons with the widely used KDE, but also apply GBHT to anomaly detection to showcase a further application of GBHT.