Collaborating Authors

modeling & simulation

Emery Brown wins a share of 2022 Gruber Neuroscience Prize


David Orenstein | Picower Institute for Learning and Memory … physics, and machine learning to create theories, mathematical models, …

Theory of Gaussian Process Regression for Machine Learning


Probabilistic modelling, which falls under the Bayesian paradigm, is gaining popularity world-wide. Its powerful capabilities, such as giving a reliable estimation of its own uncertainty, makes Gaussian process regression a must-have skill for any data scientist. Gaussian process regression is especially powerful when applied in the fields of data science, financial analysis, engineering and geostatistics. This course covers the fundamental mathematical concepts needed by the modern data scientist to confidently apply Gaussian process regression. The course also covers the implementation of Gaussian process regression in Python.

Marginal Distance and Hilbert-Schmidt Covariances-Based Independence Tests for Multivariate Functional Data

Journal of Artificial Intelligence Research

We study the pairwise and mutual independence testing problem for multivariate functional data. Using a basis representation of functional data, we reduce this problem to testing the independence of multivariate data, which may be high-dimensional. For pairwise independence, we apply tests based on distance and Hilbert-Schmidt covariances as well as their marginal versions, which aggregate these covariances for coordinates of random processes. In the case of mutual independence, we study asymmetric and symmetric aggregating measures of pairwise dependence. A theoretical justification of the test procedures is established. In extensive simulation studies and examples based on a real economic data set, we investigate and compare the performance of the tests in terms of size control and power. An important finding is that tests based on distance and Hilbert-Schmidt covariances are usually more powerful than their marginal versions under linear dependence, while the reverse is true under non-linear dependence.

Methodological conduct of prognostic prediction models developed using machine learning in oncology: a systematic review - BMC Medical Research Methodology


Describe and evaluate the methodological conduct of prognostic prediction models developed using machine learning methods in oncology. We conducted a systematic review in MEDLINE and Embase between 01/01/2019 and 05/09/2019, for studies developing a prognostic prediction model using machine learning methods in oncology. We used the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) statement, Prediction model Risk Of Bias ASsessment Tool (PROBAST) and CHecklist for critical Appraisal and data extraction for systematic Reviews of prediction Modelling Studies (CHARMS) to assess the methodological conduct of included publications. Results were summarised by modelling type: regression-, non-regression-based and ensemble machine learning models. Sixty-two publications met inclusion criteria developing 152 models across all publications. Forty-two models were regression-based, 71 were non-regression-based and 39 were ensemble models. A median of 647 individuals (IQR: 203 to 4059) and 195 events (IQR: 38 to 1269) were used for model development, and 553 individuals (IQR: 69 to 3069) and 50 events (IQR: 17.5 to 326.5) for model validation. A higher number of events per predictor was used for developing regression-based models (median: 8, IQR: 7.1 to 23.5), compared to alternative machine learning (median: 3.4, IQR: 1.1 to 19.1) and ensemble models (median: 1.7, IQR: 1.1 to 6). Sample size was rarely justified (n = 5/62; 8%). Some or all continuous predictors were categorised before modelling in 24 studies (39%). 46% (n = 24/62) of models reporting predictor selection before modelling used univariable analyses, and common method across all modelling types. Ten out of 24 models for time-to-event outcomes accounted for censoring (42%). A split sample approach was the most popular method for internal validation (n = 25/62, 40%). Calibration was reported in 11 studies. Less than half of models were reported or made available. The methodological conduct of machine learning based clinical prediction models is poor. Guidance is urgently needed, with increased awareness and education of minimum prediction modelling standards. Particular focus is needed on sample size estimation, development and validation analysis methods, and ensuring the model is available for independent validation, to improve quality of machine learning based clinical prediction models.

Machine Learning can give a 10 second Turbulence Warning -


Turbulence is one of the leading cause of injuries on passenger planes and--if you don't have your seat belt on--those injuries can be fatal. Approximately 58 people are injured by turbulence every year in the U.S. while not wearing their seat belts [1]. While fatalities for commercial flights are rare, when you factor in general aviation--which includes aerial flight training, medevac operations, and recreational flying--turbulence encounters cause about 40 fatalities per year. There is also a staggering financial cost linked to turbulence, with estimated costs to the airline industry of around $150-$500 million per year in accident investigations, aircraft damage, insurance claims, legal settlements, and missed work [2]. Some passengers are so traumatized by their experience, they swear to never fly again [3].

Gaussian Simulation of Spatial Data Using R


Gaussian Simulation of Spatial Data Using R Richard E. Plant This post is a condensed version of an Additional Topic to accompany Spatial Data Analysis in Ecology and Agriculture using R, Second Edition. The full version together with the data and R code can be found in the Additional Topics section of the book’s website, … Continue reading Gaussian Simulation of Spatial Data Using RGaussian Simulation of Spatial Data Using R was first posted on March 21, 2022 at 2:38 pm.

Federated Remote Physiological Measurement with Imperfect Data


The growing need for technology that supports remote healthcare is being acutely highlighted by an aging population and the COVID-19 pandemic. In health-related machine learning applications the ability to learn predictive models without data leaving a private device is attractive, especially when these data might contain features (e.g., photographs or videos of the body) that make identifying a subject trivial and/or the training data volume is large (e.g., uncompressed video). Camera-based remote physiological sensing facilitates scalable and low-cost measurement, but is a prime example of a task that involves analysing high bit-rate videos containing identifiable images and sensitive health information. Federated learning enables privacy-preserving decentralized training which has several properties beneficial for camera-based sensing. We develop the first mobile federated learning camera-based sensing system and show that it can perform competitively with traditional state-of-the-art supervised approaches.

Customer churn prediction for SaaS companies


See how we help SaaS companies use machine learning, predictive models and data-driven CX strategies to prevent attrition.

LIMREF: Local Interpretable Model Agnostic Rule-based Explanations for Forecasting, with an Application to Electricity Smart Meter Data Artificial Intelligence

Accurate electricity demand forecasts play a crucial role in sustainable power systems. To enable better decision-making especially for demand flexibility of the end-user, it is necessary to provide not only accurate but also understandable and actionable forecasts. To provide accurate forecasts Global Forecasting Models (GFM) trained across time series have shown superior results in many demand forecasting competitions and real-world applications recently, compared with univariate forecasting approaches. We aim to fill the gap between the accuracy and the interpretability in global forecasting approaches. In order to explain the global model forecasts, we propose Local Interpretable Model-agnostic Rule-based Explanations for Forecasting (LIMREF), a local explainer framework that produces k-optimal impact rules for a particular forecast, considering the global forecasting model as a black-box model, in a model-agnostic way. It provides different types of rules that explain the forecast of the global model and the counterfactual rules, which provide actionable insights for potential changes to obtain different outputs for given instances. We conduct experiments using a large-scale electricity demand dataset with exogenous features such as temperature and calendar effects. Here, we evaluate the quality of the explanations produced by the LIMREF framework in terms of both qualitative and quantitative aspects such as accuracy, fidelity, and comprehensibility and benchmark those against other local explainers.


AAAI Conferences

We present a novel model-metric co-learning (MMCL) methodology for sequence classification which learns in the model space -- each data item (sequence) is represented by a predictive model from a carefully designed model class. MMCL learning encourages sequences from the same class to be represented by'close' model representations, well separated from those for different classes. Existing approaches to the problem either fit a single model to all the data, or a (predominantly linear) model on each sequence. We introduce a novel hybrid approach spanning the two extremes. The model class we use is a special form of adaptive high-dimensional non-linear state space model with a highly constrained and simple dynamic part. The dynamic part is identical for all data items and acts as a temporal filter providing a rich pool of dynamic features that can be selectively extracted by individual (static) linear readout mappings representing the sequences. Alongside learning the dynamic part, we also learn the global metric in the model readout space. Experiments on synthetic and benchmark data sets confirm the effectiveness of the algorithm compared to a variety of alternative methods.