Regression
Advancing Carbon Capture using AI: Design of permeable membrane and estimation of parameters for Carbon Capture using linear regression and membrane-based equations
Panerua, Bishwash, Paneru, Biplov
This study focuses on membrane-based systems for CO$_2$ separation, addressing the urgent need for efficient carbon capture solutions to mitigate climate change. Linear regression models, based on membrane equations, were utilized to estimate key parameters, including porosity ($\epsilon$) of 0.4805, Kozeny constant (K) of 2.9084, specific surface area ($\sigma$) of 105.3272 m$^2$/m$^3$, mean pressure (Pm) of 6.2166 MPa, viscosity ($\mu$) of 0.1997 Ns/m$^2$, and gas flux (Jg) of 3.2559 kg m$^{-2}$ s$^{-1}$. These parameters were derived from the analysis of synthetic datasets using linear regression. The study also provides insights into the performance of the membrane, with a flow rate (Q) of 9.8778 $\times$ 10$^{-4}$ m$^3$/s, an injection pressure (P$_1$) of 2.8219 MPa, and an exit pressure (P$_2$) of 2.5762 MPa. The permeability value of 0.045 for CO$_2$ indicates the potential for efficient separation. Optimizing membrane properties to selectively block CO$_2$ while allowing other gases to pass is crucial for improving carbon capture efficiency. By integrating these technologies into industrial processes, significant reductions in greenhouse gas emissions can be achieved, fostering a circular carbon economy and contributing to global climate goals. This study also explores how artificial intelligence (AI) can aid in designing membranes for carbon capture, addressing the global climate change challenge and supporting the Sustainable Development Goals (SDGs) set by the United Nations.
Longitudinal Missing Data Imputation for Predicting Disability Stage of Patients with Multiple Sclerosis
Vazifehdan, Mahin, Bosoni, Pietro, Pala, Daniele, Tavazzi, Eleonora, Bergamaschi, Roberto, Bellazzi, Riccardo, Dagliati, Arianna
Multiple Sclerosis (MS) is a chronic disease characterized by progressive or alternate impairment of neurological functions (motor, sensory, visual, and cognitive). Predicting disease progression with a probabilistic and time-dependent approach might help in suggesting interventions that can delay the progression of the disease. However, extracting informative knowledge from irregularly collected longitudinal data is difficult, and missing data pose significant challenges. MS progression is measured through the Expanded Disability Status Scale (EDSS), which quantifies and monitors disability in MS over time. EDSS assesses impairment in eight functional systems (FS). Frequently, only the EDSS score assigned by clinicians is reported, while FS sub-scores are missing. Imputing these scores might be useful, especially to stratify patients according to their phenotype assessed over the disease progression. This study aimed at i) exploring different methodologies for imputing missing FS sub-scores, and ii) predicting the EDSS score using complete clinical data. Results show that Exponential Weighted Moving Average achieved the lowest error rate in the missing data imputation task; furthermore, the combination of Classification and Regression Trees for the imputation and SVM for the prediction task obtained the best accuracy.
Budget-constrained Collaborative Renewable Energy Forecasting Market
Goncalves, Carla, Bessa, Ricardo J., Teixeira, Tiago, Vinagre, Joao
Accurate power forecasting from renewable energy sources (RES) is crucial for integrating additional RES capacity into the power system and realizing sustainability goals. This work emphasizes the importance of integrating decentralized spatio-temporal data into forecasting models. However, decentralized data ownership presents a critical obstacle to the success of such spatio-temporal models, and incentive mechanisms to foster data-sharing need to be considered. The main contributions are a) a comparative analysis of the forecasting models, advocating for efficient and interpretable spline LASSO regression models, and b) a bidding mechanism within the data/analytics market to ensure fair compensation for data providers and enable both buyers and sellers to express their data price requirements. Furthermore, an incentive mechanism for time series forecasting is proposed, effectively incorporating price constraints and preventing redundant feature allocation. Results show significant accuracy improvements and potential monetary gains for data sellers. For wind power data, an average root mean squared error improvement of over 10% was achieved by comparing forecasts generated by the proposal with locally generated ones.
Reviews: A First-Order Algorithmic Framework for Wasserstein Distributionally Robust Logistic Regression
This paper derives a novel algorithm for solving the dual DRLR problem when \kappa \infty (i.e. the labels may change during transport). The algorithm performs a golden section search for \lambda, within which the sub-problem for optimal \beta, fixing \lambda, is solved by an ADMM algorithm. The ADMM algorithm differs from typical ADMM approaches in two ways: (1) the \beta-update is ill-conditioned, requiring a careful choice of iterative method, while (2) the auxiliary \mu update is locally strongly convex, enabling the use of a first-order (not quadratic) approximation with a fixed step size. I see three theoretical contributions: 1. An upper bound on optimal \lambda, stated in Proposition 1, which enables the golden section search.
Reviews: First order expansion of convex regularized estimators
The present paper proposes an approximation, based on the first order Taylor expansion of convex regularizer. In the regularized regression setting and under some mild condition on the loss function and the underlying distribution that generates the data, the authors prove that one can replace the regularization term of the regression algorithm by its Taylor approximation and have a guarantee that the solution obtain with this approximation will be close to the original solution (according to the Mahalanobis distance). The authors give then examples of such proxy for square loss and logistic regression and also for Constrained Lasso, Penalized Lasso and Group Lasso. The paper also proposes a discussion where this approach can be useful. Although this paper is a bit technical, it is well written and the result are on my opinion non trivial and interesting.
Diffusion-aware Censored Gaussian Processes for Demand Modelling
Inferring the true demand for a product or a service from aggregate data is often challenging due to the limited available supply, thus resulting in observations that are censored and correspond to the realized demand, thereby not accounting for the unsatisfied demand. Censored regression models are able to account for the effect of censoring due to the limited supply, but they don't consider the effect of substitutions, which may cause the demand for similar alternative products or services to increase. This paper proposes Diffusion-aware Censored Demand Models, which combine a Tobit likelihood with a graph diffusion process in order to model the latent process of transfer of unsatisfied demand between similar products or services. We instantiate this new class of models under the framework of GPs and, based on both simulated and real-world data for modeling sales, bike-sharing demand, and EV charging demand, demonstrate its ability to better recover the true demand and produce more accurate out-of-sample predictions.
Reviews: Minimax Optimal Alternating Minimization for Kernel Nonparametric Tensor Learning
This paper presents a new non-parametric tensor regression method based on kernels. More specifically, the authors proposed a regularization based optimization approach with alternating minimization for non-parametric tensor regression model [15]. Moreover, the theoretical guarantee for the proposed method is presented the paper. Through experiments on various datasets, the proposed method compares favorably with existing state-of-the-art. The paper is clearly written and easy to read. I understand the key contribution of this paper is the theoretical analysis of the non-parametric tensor regression.
Reviews: Mixed Linear Regression with Multiple Components
This paper proposes a new objective function to solve mixed linear regression problem, but fails to explain many important issues: (1) What is the intuition of the introduction and advantage of the objective function? The answer between line 39 and line 40 is not good. Because if it is modeled as finite mixture model as in many references, "objective value is zero when {w_k}_{k 1,2,...,K} is the global optima and y's do not contain any noise" is also true. The following is a example. It seems there is no probabilistic interpretation for the objective function in Eq.(1).
Reviews: Low-Rank Regression with Tensor Responses
Strength: --The paper provides the theoretical analysis of approximation guarantees and a generalization bound for the class of tensor-valued regression functions. Weakness: --A major drawback is that the novelty and contribution is rather limited. The key idea and the model of this paper is actually equivalent to the HOPLS in the following paper: [Zhao et. In HOPLS, it assumes the tensor input has low-rank structure and also the tensor output has low-rank structure, and the link of them is established in the common latent space. And then follows a regression step against the projected latent variables.