AITopics

2406.00339

Country:

Asia > Afghanistan > Parwan Province > Charikar (0.04)
Europe > Germany > North Rhine-Westphalia > Arnsberg Region > Dortmund (0.04)

Genre:

Research Report > New Finding (0.86)
Research Report > Experimental Study (0.55)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.55)

An Efficient Multi Quantile Regression Network with Ad Hoc Prevention of Quantile Crossing

Decke, Jens, Jenß, Arne, Sick, Bernhard, Gruhl, Christian

This article presents the Sorting Composite Quantile Regression Neural Network (SCQRNN), an advanced quantile regression model designed to prevent quantile crossing and enhance computational efficiency. Integrating ad hoc sorting in training, the SCQRNN ensures non-intersecting quantiles, boosting model reliability and interpretability. We demonstrate that the SCQRNN not only prevents quantile crossing and reduces computational complexity but also achieves faster convergence than traditional models. This advancement meets the requirements of high-performance computing for sustainable, accurate computation. In organic computing, the SCQRNN enhances self-aware systems with predictive uncertainties, enriching applications across finance, meteorology, climate science, and engineering.

complexity, quantile, scqrnn, (11 more...)

2406.0008

Country: Europe > Germany (0.04)

Genre: Research Report (1.00)

Industry: Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Obster, Fabian, Heumann, Christian

Introducing sgboost: A Practical Guide and Implementation of sparse-group boosting in R

This paper introduces the sgboost package in R, which implements sparse-group boosting for modeling high-dimensional data with natural groupings in covariates. Sparse-group boosting offers a flexible approach for both group and individual variable selection, reducing overfitting and enhancing model interpretability. The package uses regularization techniques based on the degrees of freedom of individual and group base-learners, and is designed to be used in conjunction with the mboost package. Through comparisons with existing methods and demonstration of its unique functionalities, this paper provides a practical guide on utilizing sparse-group boosting in R, accompanied by code examples to facilitate its application in various research domains. Overall, this paper serves as a valuable resource for researchers and practitioners seeking to use sparse-group boosting for efficient and interpretable high-dimensional data analysis.

coefficient, predictor, variable selection, (15 more...)

2405.21037

Country:

South America > Chile (0.05)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.05)
Africa > Middle East > Tunisia (0.05)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.86)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)

Rosa, Paul, Rousseau, Judith

Nonparametric regression on random geometric graphs sampled from submanifolds

We consider the nonparametric regression problem when the covariates are located on an unknown smooth compact submanifold of a Euclidean space. Under defining a random geometric graph structure over the covariates we analyze the asymptotic frequentist behaviour of the posterior distribution arising from Bayesian priors designed through random basis expansion in the graph Laplacian eigenbasis. Under Holder smoothness assumption on the regression function and the density of the covariates over the submanifold, we prove that the posterior contraction rates of such methods are minimax optimal (up to logarithmic factors) for any positive smoothness index.

aplacianj une 3, exp 1, theorem 3, (13 more...)

2405.20909

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Statistical inference for case-control logistic regression via integrating external summary data

Shi, Hengchao, Liu, Xinyi, Zheng, Ming, Yu, Wen

Case-control sampling is a commonly used retrospective sampling design to alleviate imbalanced structure of binary data. When fitting the logistic regression model with case-control data, although the slope parameter of the model can be consistently estimated, the intercept parameter is not identifiable, and the marginal case proportion is not estimatable, either. We consider the situations in which besides the case-control data from the main study, called internal study, there also exists summary-level information from related external studies. An empirical likelihood based approach is proposed to make inference for the logistic model by incorporating the internal case-control data and external information. We show that the intercept parameter is identifiable with the help of external information, and then all the regression parameters as well as the marginal case proportion can be estimated consistently. The proposed method also accounts for the possible variability in external studies. The resultant estimators are shown to be asymptotically normally distributed. The asymptotic variance-covariance matrix can be consistently estimated by the case-control data. The optimal way to utilized external information is discussed. Simulation studies are conducted to verify the theoretical findings. A real data set is analyzed for illustration.

estimator, information, internal data, (13 more...)

2405.20655

Country:

Asia > China > Shanghai > Shanghai (0.04)
Asia > India (0.04)

Genre:

Research Report > Experimental Study (0.85)
Research Report > New Finding (0.85)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.92)

Valentine, Alissa A., Charney, Alexander W., Landi, Isotta

Fair Machine Learning for Healthcare Requires Recognizing the Intersectionality of Sociodemographic Factors, a Case Study

As interest in implementing artificial intelligence (AI) in medical systems grows, discussion continues on how to evaluate the fairness of these systems, or the disparities they may perpetuate. Socioeconomic status (SES) is commonly included in machine learning models to control for health inequities, with the underlying assumption that increased SES is associated with better health. In this work, we considered a large cohort of patients from the Mount Sinai Health System in New York City to investigate the effect of patient SES, race, and sex on schizophrenia (SCZ) diagnosis rates via a logistic regression model. Within an intersectional framework, patient SES, race, and sex were found to have significant interactions. Our findings showed that increased SES is associated with a higher probability of obtaining a SCZ diagnosis in Black Americans ($\beta=4.1\times10^{-8}$, $SE=4.5\times10^{-9}$, $p < 0.001$). Whereas high SES acts as a protective factor for SCZ diagnosis in White Americans ($\beta=-4.1\times10^{-8}$, $SE=6.7\times10^{-9}$, $p < 0.001$). Further investigation is needed to reliably explain and quantify health disparities. Nevertheless, we advocate that building fair AI tools for the health care space requires recognizing the intersectionality of sociodemographic factors.

diagnosis, disparity, scz diagnosis, (15 more...)

2407.15006

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.05)
North America > United States > New York > New York County > New York City (0.04)
North America > Canada (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Health Care Providers & Services (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.54)

Kawakami, Yuta, Kuroki, Manabu, Tian, Jin

Probabilities of Causation for Continuous and Vector Variables

Probabilities of causation (PoC) are valuable concepts for explainable artificial intelligence and practical decision-making. PoC are originally defined for scalar binary variables. In this paper, we extend the concept of PoC to continuous treatment and outcome variables, and further generalize PoC to capture causal effects between multiple treatments and multiple outcomes. In addition, we consider PoC for a sub-population and PoC with multi-hypothetical terms to capture more sophisticated counterfactual information useful for decision-making. We provide a nonparametric identification theorem for each type of PoC we introduce. Finally, we illustrate the application of our results on a real-world dataset about education.

assumption 3, pns, trajectory, (14 more...)

2405.20487

Country:

Asia > Japan > Honshū > Kantō > Kanagawa Prefecture > Yokohama (0.04)
North America > United States > Iowa > Story County > Ames (0.04)
North America > Greenland (0.04)
(2 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.87)

Industry:

Health & Medicine (1.00)
Education > Educational Setting > K-12 Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Research on Credit Risk Early Warning Model of Commercial Banks Based on Neural Network Algorithm

Cheng, Yu, Yang, Qin, Wang, Liyang, Xiang, Ao, Zhang, Jingyu

In the realm of globalized financial markets, commercial banks are confronted with an escalating magnitude of credit risk, thereby imposing heightened requisites upon the security of bank assets and financial stability. This study harnesses advanced neural network techniques, notably the Backpropagation (BP) neural network, to pioneer a novel model for preempting credit risk in commercial banks. The discourse initially scrutinizes conventional financial risk preemptive models, such as ARMA, ARCH, and Logistic regression models, critically analyzing their real-world applications. Subsequently, the exposition elaborates on the construction process of the BP neural network model, encompassing network architecture design, activation function selection, parameter initialization, and objective function construction. Through comparative analysis, the superiority of neural network models in preempting credit risk in commercial banks is elucidated. The experimental segment selects specific bank data, validating the model's predictive accuracy and practicality. Research findings evince that this model efficaciously enhances the foresight and precision of credit risk management.

application, commercial bank, neural network, (12 more...)

2405.10762

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
Asia > China > Sichuan Province > Chengdu (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Missouri > St. Louis County > St. Louis (0.04)

Genre: Research Report (1.00)

Industry: Banking & Finance > Credit (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.35)

Reconstruction Attacks on Machine Unlearning: Simple Models are Vulnerable

Bertran, Martin, Tang, Shuai, Kearns, Michael, Morgenstern, Jamie, Roth, Aaron, Wu, Zhiwei Steven

As model training on personal data becomes commonplace, there has been a growing literature on data protection in machine learning (ML), which includes at least two thrusts: Data Privacy The primary concern regarding data privacy in machine learning (ML) applications is that models might inadvertently reveal details about the individual data points used in their training. This type of privacy risk can manifest in various ways, ranging from membership inference attacks [27]--which only seek to confirm whether a specific individual's data was used in the training--to more severe reconstruction attacks [10] that attempt to recover entire data records of numerous individuals. To address these risks, algorithms that adhere to differential privacy standards [12] provide proven safeguards, specifically limiting the ability to infer information about individual training data. Machine Unlearning Proponents of data autonomy have advocated for individuals to have the right to decide how their data is used, including the right to retroactively ask that their data and its influences be removed from any model trained on it. Data deletion, or machine unlearning, refer to technical approaches which allow such removal of influence [15, 4]. The idea is that, after an individual's data is deleted, the resulting model should be in the state it would have been had the model originally been trained without the individual in question's data. The primary focus of this literature has been on achieving or approximating this condition for complex models in ways that are more computationally efficient than full retraining (see e.g.

random fourier feature, reconstruction, regression, (14 more...)

2405.20272

Country:

North America > United States > Pennsylvania (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report > New Finding (0.47)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.32)

Vinchurkar, Tirtha, Ock, Janghoon, Farimani, Amir Barati

Explainable Data-driven Modeling of Adsorption Energy in Heterogeneous Catalysis

The increasing popularity of machine learning (ML) in catalysis has spurred interest in leveraging these techniques to enhance catalyst design. Our study aims to bridge the gap between physics-based studies and data-driven methodologies by integrating ML techniques with eXplainable AI (XAI). Specifically, we employ two XAI techniques: Post-hoc XAI analysis and Symbolic Regression. These techniques help us unravel the correlation between adsorption energy and the properties of the adsorbate-catalyst system. Leveraging a large dataset such as the Open Catalyst Dataset (OC20), we employ a combination of shallow ML techniques and XAI methodologies. Our investigation involves utilizing multiple shallow machine learning techniques to predict adsorption energy, followed by post-hoc analysis for feature importance, inter-feature correlations, and the influence of various feature values on the prediction of adsorption energy. The post-hoc analysis reveals that adsorbate properties exert a greater influence than catalyst properties in our dataset. The top five features based on higher Shapley values are adsorbate electronegativity, the number of adsorbate atoms, catalyst electronegativity, effective coordination number, and the sum of atomic numbers of the adsorbate molecule. There is a positive correlation between catalyst and adsorbate electronegativity with the prediction of adsorption energy. Additionally, symbolic regression yields results consistent with SHAP analysis. It deduces a mathematical relationship indicating that the square of the catalyst electronegativity is directly proportional to the adsorption energy. These consistent correlations resemble those derived from physics-based equations in previous research. Our work establishes a robust framework that integrates ML techniques with XAI, leveraging large datasets like OC20 to enhance catalyst design through model explainability.

adsorption energy, correlation, electronegativity, (15 more...)

2405.20397

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Materials > Chemicals (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)