The main objective of this work is to use machine-learning (ML) algorithms to develop a powerful model to predict well-integrity (WI) risk categories of gas-lifted wells. The model described in the complete paper can predict well-risk level and provide a unique method to convert associated failure risk of each element in the well envelope into tangible values. The predictive model, which predicts the risk status of wells and classifies their integrity level into five categories rather than three broad-range categories, as in qualitative risk classification. The five categories are Category 1, which is too risky Category 2, which is still too risky but less so than Category 1 Category 3, which is medium risk but can be elevated if additional barrier failures occur Category 4, which is low risk but features some impaired barriers Category 5, which is the lowest in risk The failure model, which identifies whether the well is considered to be in failure mode. In addition, the model can identify wells that require prompt mitigation.
As humans, we perceive the three-dimensional structure of the world around us with apparent ease. Think of how vivid the three-dimensional percept is when you look at a vase of flowers sitting on the table next to you. You can tell the shape and translucency of each petal through the subtle patterns of light and shading that play across its surface and effortlessly segment each flower from the background of the scene (Figure 1.1). Looking at a framed group por- trait, you can easily count (and name) all of the people in the picture and even guess at their emotions from their facial appearance. Perceptual psychologists have spent decades trying to understand how the visual system works and, even though they can devise optical illusions1 to tease apart some of its principles (Figure 1.3), a complete solution to this puzzle remains elusive (Marr 1982; Palmer 1999; Livingstone 2008).
Universal access to affordable, reliable, and sustainable modern energy is a Sustainable Development Goal (SDG). However, lack of sufficient power generation, poor transmission and distribution infrastructure, affordability, uncertain climate concerns, diversification and decentralization of energy production, and changing demand patterns are creating complex challenges in power generation. According to the 2019 International Energy Agency (IEA) report, 860 million people lack access to electricity, and three billion people use open fires and simple stoves fueled by kerosene, biomass, or coal for cooking. As a result, over four million people die prematurely of the illnesses associated. Artificial intelligence (AI) offers a great potential to lower energy costs, cut energy waste, and facilitate and accelerate the use of renewable and clean energy sources in power grids worldwide. In addition, it can help improve the planning, operation, and control of power systems.
Assaad, Charles K. (Univ. Grenoble Alpes, CNRS, Grenoble INP, LIG, EasyVista) | Devijver, Emilie (Univ. Grenoble Alpes, CNRS, Grenoble INP, LIG) | Gaussier, Eric (Univ. Grenoble Alpes, CNRS, Grenoble INP, LIG)
We introduce in this survey the major concepts, models, and algorithms proposed so far to infer causal relations from observational time series, a task usually referred to as causal discovery in time series. To do so, after a description of the underlying concepts and modelling assumptions, we present different methods according to the family of approaches they belong to: Granger causality, constraint-based approaches, noise-based approaches, score-based approaches, logic-based approaches, topology-based approaches, and difference-based approaches. We then evaluate several representative methods to illustrate the behaviour of different families of approaches. This illustration is conducted on both artificial and real datasets, with different characteristics. The main conclusions one can draw from this survey is that causal discovery in times series is an active research field in which new methods (in every family of approaches) are regularly proposed, and that no family or method stands out in all situations. Indeed, they all rely on assumptions that may or may not be appropriate for a particular dataset.
Two main problems are studied in this article. The first one is the use of the extrusion process for controlled thermo-mechanical degradation of polyethylene for recycling applications. The second is the data-based modelling of such reactive extrusion processes. Polyethylenes (high density polyethylene (HDPE) and ultra-high molecular weight polyethylene (UHMWPE)) were extruded in a corotating twin-screw extruder under high temperatures (350 °C < T < 420 °C) for various process conditions (flow rate and screw rotation speed). These process conditions involved a decrease in the molecular weight due to degradation reactions. A numerical method based on the Carreau-Yasuda model was developed to predict the rheological behaviour (variation of the viscosity versus shear rate) from the in-line measurement of the die pressure. The results were successfully compared to the viscosity measured from offline measurement assuming the Cox‑Merz law. Weight average molecular weights were estimated from the resulting zero-shear rate viscosity. Furthermore, the linear viscoelastic behaviours (Frequency dependence of the complex shear modulus) were also used to predict the molecular weight distributions of final products by an inverse rheological method. Size exclusion chromatography (SEC) was performed on five samples, and the resulting molecular weight distributions were compared to the values obtained with the two aforementioned techniques. The values of weight average molecular weights were similar for the three techniques. The complete molecular weight distributions obtained by inverse rheology were similar to the SEC ones for extruded HDPE samples, but some inaccuracies were observed for extruded UHMWPE samples. The Ludovic® (SC-Consultants, Saint-Etienne, France) corotating twin-screw extrusion simulation software was used as a classical process simulation. However, as the rheo-kinetic laws of this process were unknown, the software could not predict all the flow characteristics successfully. Finally, machine learning techniques, able to operate in the low-data limit, were tested to build predicting models of the process outputs and material characteristics. Support Vector Machine Regression (SVR) and sparsed Proper Generalized Decomposition (sPGD) techniques were chosen to predict the process outputs successfully. These methods were also applied to material characteristics data, and both were found to be effective in predicting molecular weights. More precisely, the sPGD gave better results than the SVR for the zero-shear viscosity prediction. Stochastic methods were also tested on some of the data and showed promising results.
Lin, Yinan, Zhou, Wen, Geng, Zhi, Xiao, Gexin, Yin, Jianxin
In traditional logistic regression models, the link function is often assumed to be linear and continuous in predictors. Here, we consider a threshold model that all continuous features are discretized into ordinal levels, which further determine the binary responses. Both the threshold points and regression coefficients are unknown and to be estimated. For high dimensional data, we propose a fusion penalized logistic threshold regression (FILTER) model, where a fused lasso penalty is employed to control the total variation and shrink the coefficients to zero as a method of variable selection. Under mild conditions on the estimate of unknown threshold points, we establish the non-asymptotic error bound for coefficient estimation and the model selection consistency. With a careful characterization of the error propagation, we have also shown that the tree-based method, such as CART, fulfill the threshold estimation conditions. We find the FILTER model is well suited in the problem of early detection and prediction for chronic disease like diabetes, using physical examination data. The finite sample behavior of our proposed method are also explored and compared with extensive Monte Carlo studies, which supports our theoretical discoveries.
Dürr, Oliver, Hörling, Stephan, Dold, Daniel, Kovylov, Ivonne, Sick, Beate
Variational inference (VI) is a technique to approximate difficult to compute posteriors by optimization. In contrast to MCMC, VI scales to many observations. In the case of complex posteriors, however, state-of-the-art VI approaches often yield unsatisfactory posterior approximations. This paper presents Bernstein flow variational inference (BF-VI), a robust and easy-to-use method, flexible enough to approximate complex multivariate posteriors. BF-VI combines ideas from normalizing flows and Bernstein polynomial-based transformation models. In benchmark experiments, we compare BF-VI solutions with exact posteriors, MCMC solutions, and state-of-the-art VI methods including normalizing flow based VI. We show for low-dimensional models that BF-VI accurately approximates the true posterior; in higher-dimensional models, BF-VI outperforms other VI methods. Further, we develop with BF-VI a Bayesian model for the semi-structured Melanoma challenge data, combining a CNN model part for image data with an interpretable model part for tabular data, and demonstrate for the first time how the use of VI in semi-structured models.
Kontolati, Katiana, Loukrezis, Dimitrios, Giovanis, Dimitrios D., Vandanapu, Lohit, Shields, Michael D.
Constructing surrogate models for uncertainty quantification (UQ) on complex partial differential equations (PDEs) having inherently high-dimensional $\mathcal{O}(10^{\ge 2})$ stochastic inputs (e.g., forcing terms, boundary conditions, initial conditions) poses tremendous challenges. The curse of dimensionality can be addressed with suitable unsupervised learning techniques used as a pre-processing tool to encode inputs onto lower-dimensional subspaces while retaining its structural information and meaningful properties. In this work, we review and investigate thirteen dimension reduction methods including linear and nonlinear, spectral, blind source separation, convex and non-convex methods and utilize the resulting embeddings to construct a mapping to quantities of interest via polynomial chaos expansions (PCE). We refer to the general proposed approach as manifold PCE (m-PCE), where manifold corresponds to the latent space resulting from any of the studied dimension reduction methods. To investigate the capabilities and limitations of these methods we conduct numerical tests for three physics-based systems (treated as black-boxes) having high-dimensional stochastic inputs of varying complexity modeled as both Gaussian and non-Gaussian random fields to investigate the effect of the intrinsic dimensionality of input data. We demonstrate both the advantages and limitations of the unsupervised learning methods and we conclude that a suitable m-PCE model provides a cost-effective approach compared to alternative algorithms proposed in the literature, including recently proposed expensive deep neural network-based surrogates and can be readily applied for high-dimensional UQ in stochastic PDEs.
Gahungu, Paterne, Lanyon, Christopher W, Alvarez, Mauricio A, Bainomugisha, Engineer, Smith, Michael, Wilkinson, Richard D.
Linear systems occur throughout engineering and the sciences, most notably as differential equations. In many cases the forcing function for the system is unknown, and interest lies in using noisy observations of the system to infer the forcing, as well as other unknown parameters. In differential equations, the forcing function is an unknown function of the independent variables (typically time and space), and can be modelled as a Gaussian process (GP). In this paper we show how the adjoint of a linear system can be used to efficiently infer forcing functions modelled as GPs, after using a truncated basis expansion of the GP kernel. We show how exact conjugate Bayesian inference for the truncated GP can be achieved, in many cases with substantially lower computation than would be required using MCMC methods. We demonstrate the approach on systems of both ordinary and partial differential equations, and by testing on synthetic data, show that the basis expansion approach approximates well the true forcing with a modest number of basis vectors. Finally, we show how to infer point estimates for the non-linear model parameters, such as the kernel length-scales, using Bayesian optimisation.
Many problems plague the estimation of Phillips curves. Among them is the hurdle that the two key components, inflation expectations and the output gap, are both unobserved. Traditional remedies include creating reasonable proxies for the notable absentees or extracting them via some form of assumptions-heavy filtering procedure. I propose an alternative route: a Hemisphere Neural Network (HNN) whose peculiar architecture yields a final layer where components can be interpreted as latent states within a Neural Phillips Curve. There are benefits. First, HNN conducts the supervised estimation of nonlinearities that arise when translating a high-dimensional set of observed regressors into latent states. Second, computations are fast. Third, forecasts are economically interpretable. Fourth, inflation volatility can also be predicted by merely adding a hemisphere to the model. Among other findings, the contribution of real activity to inflation appears severely underestimated in traditional econometric specifications. Also, HNN captures out-of-sample the 2021 upswing in inflation and attributes it first to an abrupt and sizable disanchoring of the expectations component, followed by a wildly positive gap starting from late 2020. HNN's gap unique path comes from dispensing with unemployment and GDP in favor of an amalgam of nonlinearly processed alternative tightness indicators -- some of which are skyrocketing as of early 2022.