Collaborating Authors

Modeling & Simulation

Emery Brown wins a share of 2022 Gruber Neuroscience Prize


David Orenstein | Picower Institute for Learning and Memory … physics, and machine learning to create theories, mathematical models, …

Theory of Gaussian Process Regression for Machine Learning


Probabilistic modelling, which falls under the Bayesian paradigm, is gaining popularity world-wide. Its powerful capabilities, such as giving a reliable estimation of its own uncertainty, makes Gaussian process regression a must-have skill for any data scientist. Gaussian process regression is especially powerful when applied in the fields of data science, financial analysis, engineering and geostatistics. This course covers the fundamental mathematical concepts needed by the modern data scientist to confidently apply Gaussian process regression. The course also covers the implementation of Gaussian process regression in Python.

Building a Predictive Model using Python Framework: A Step-by-Step Guide


As a part of the agreement, we broke down the entire project into three stages, each consisting of a distinctive set of responsibilities. The first stage included a comprehensive understanding of our client's business values and data points. Subsequently, our data scientists refined and organized the dataset to pull out patterns and insights as needed. In the final stage, our data engineers used the refined data and developed a predictive analytics machine learning model to accurately predict upcoming sales cycles. It helped our client prepare better for the upcoming trends in the market and, resultantly, outdo their competitors.

Methodological conduct of prognostic prediction models developed using machine learning in oncology: a systematic review - BMC Medical Research Methodology


Describe and evaluate the methodological conduct of prognostic prediction models developed using machine learning methods in oncology. We conducted a systematic review in MEDLINE and Embase between 01/01/2019 and 05/09/2019, for studies developing a prognostic prediction model using machine learning methods in oncology. We used the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) statement, Prediction model Risk Of Bias ASsessment Tool (PROBAST) and CHecklist for critical Appraisal and data extraction for systematic Reviews of prediction Modelling Studies (CHARMS) to assess the methodological conduct of included publications. Results were summarised by modelling type: regression-, non-regression-based and ensemble machine learning models. Sixty-two publications met inclusion criteria developing 152 models across all publications. Forty-two models were regression-based, 71 were non-regression-based and 39 were ensemble models. A median of 647 individuals (IQR: 203 to 4059) and 195 events (IQR: 38 to 1269) were used for model development, and 553 individuals (IQR: 69 to 3069) and 50 events (IQR: 17.5 to 326.5) for model validation. A higher number of events per predictor was used for developing regression-based models (median: 8, IQR: 7.1 to 23.5), compared to alternative machine learning (median: 3.4, IQR: 1.1 to 19.1) and ensemble models (median: 1.7, IQR: 1.1 to 6). Sample size was rarely justified (n = 5/62; 8%). Some or all continuous predictors were categorised before modelling in 24 studies (39%). 46% (n = 24/62) of models reporting predictor selection before modelling used univariable analyses, and common method across all modelling types. Ten out of 24 models for time-to-event outcomes accounted for censoring (42%). A split sample approach was the most popular method for internal validation (n = 25/62, 40%). Calibration was reported in 11 studies. Less than half of models were reported or made available. The methodological conduct of machine learning based clinical prediction models is poor. Guidance is urgently needed, with increased awareness and education of minimum prediction modelling standards. Particular focus is needed on sample size estimation, development and validation analysis methods, and ensuring the model is available for independent validation, to improve quality of machine learning based clinical prediction models.

Machine Learning can give a 10 second Turbulence Warning -


Turbulence is one of the leading cause of injuries on passenger planes and--if you don't have your seat belt on--those injuries can be fatal. Approximately 58 people are injured by turbulence every year in the U.S. while not wearing their seat belts [1]. While fatalities for commercial flights are rare, when you factor in general aviation--which includes aerial flight training, medevac operations, and recreational flying--turbulence encounters cause about 40 fatalities per year. There is also a staggering financial cost linked to turbulence, with estimated costs to the airline industry of around $150-$500 million per year in accident investigations, aircraft damage, insurance claims, legal settlements, and missed work [2]. Some passengers are so traumatized by their experience, they swear to never fly again [3].

Gaussian Simulation of Spatial Data Using R


Gaussian Simulation of Spatial Data Using R Richard E. Plant This post is a condensed version of an Additional Topic to accompany Spatial Data Analysis in Ecology and Agriculture using R, Second Edition. The full version together with the data and R code can be found in the Additional Topics section of the book’s website, … Continue reading Gaussian Simulation of Spatial Data Using RGaussian Simulation of Spatial Data Using R was first posted on March 21, 2022 at 2:38 pm.

Nvidia speeds AI, climate modeling


It's been years since developers found that Nvidia's main product, the GPU, was useful not just for rendering video games but also for high-performance computing of the kind used in 3D modeling, weather forecasting, or the training of AI models--and it's on enterprise applications such as those that CEO Jensen Huang will focus his attention at the company's GTC 2022 conference this week. Nvidia is hoping to make it easier for CIOs building digital twins and machine learning models to secure enterprise computing, and even to speed the adoption of quantum computing with a range of new hardware and software. Digital twins, numerical models that reflect changes in real-world objects useful in design, manufacturing, and service creation, vary in their level of detail. For some applications, a simple database may suffice to record a product's service history--when it was made, who it shipped to, what modifications have been applied--while others require a full-on 3D model incorporating real-time sensor data that can be used, for example, to provide advanced warning of component failure or of rain. It's at the high end of that range that Nvidia plays.

Optimal sizing of a holdout set for safe predictive model updating Machine Learning

Risk models in medical statistics and healthcare machine learning are increasingly used to guide clinical or other interventions. Should a model be updated after a guided intervention, it may lead to its own failure at making accurate predictions. The use of a `holdout set' -- a subset of the population that does not receive interventions guided by the model -- has been proposed to prevent this. Since patients in the holdout set do not benefit from risk predictions, the chosen size must trade off maximising model performance whilst minimising the number of held out patients. By defining a general loss function, we prove the existence and uniqueness of an optimal holdout set size, and introduce parametric and semi-parametric algorithms for its estimation. We demonstrate their use on a recent risk score for pre-eclampsia. Based on these results, we argue that a holdout set is a safe, viable and easily implemented solution to the model update problem.

Hybridizing Physical and Data-driven Prediction Methods for Physicochemical Properties Machine Learning

We present a generic way to hybridize physical and data-driven methods for predicting physicochemical properties. The approach `distills' the physical method's predictions into a prior model and combines it with sparse experimental data using Bayesian inference. We apply the new approach to predict activity coefficients at infinite dilution and obtain significant improvements compared to the data-driven and physical baselines and established ensemble methods from the machine learning literature.

QuadSim: A Quadcopter Rotational Dynamics Simulation Framework For Reinforcement Learning Algorithms Artificial Intelligence

This study focuses on designing and developing a mathematically based quadcopter rotational dynamics simulation framework for testing reinforcement learning (RL) algorithms in many flexible configurations. The design of the simulation framework aims to simulate both linear and nonlinear representations of a quadcopter by solving initial value problems for ordinary differential equation (ODE) systems. In addition, the simulation environment is capable of making the simulation deterministic/stochastic by adding random Gaussian noise in the forms of process and measurement noises. In order to ensure that the scope of this simulation environment is not limited only with our own RL algorithms, the simulation environment has been expanded to be compatible with the OpenAI Gym toolkit. The framework also supports multiprocessing capabilities to run simulation environments simultaneously in parallel. To test these capabilities, many state-of-the-art deep RL algorithms were trained in this simulation framework and the results were compared in detail.