AITopics | semi-supervised inference

Collaborating Authors

semi-supervised inference

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Efficient semi-supervised inference for logistic regression under case-control studies

Quan, Zhuojun, Lin, Yuanyuan, Chen, Kani, Yu, Wen

arXiv.org Machine LearningFeb-23-2024

Semi-supervised learning has received increasingly attention in statistics and machine learning. In semi-supervised learning settings, a labeled data set with both outcomes and covariates and an unlabeled data set with covariates only are collected. We consider an inference problem in semi-supervised settings where the outcome in the labeled data is binary and the labeled data is collected by case-control sampling. Case-control sampling is an effective sampling scheme for alleviating imbalance structure in binary data. Under the logistic model assumption, case-control data can still provide consistent estimator for the slope parameter of the regression model. However, the intercept parameter is not identifiable. Consequently, the marginal case proportion cannot be estimated from case-control data. We find out that with the availability of the unlabeled data, the intercept parameter can be identified in semi-supervised learning setting. We construct the likelihood function of the observed labeled and unlabeled data and obtain the maximum likelihood estimator via an iterative algorithm. The proposed estimator is shown to be consistent, asymptotically normal, and semiparametrically efficient. Extensive simulation studies are conducted to show the finite sample performance of the proposed method. The results imply that the unlabeled data not only helps to identify the intercept but also improves the estimation efficiency of the slope parameter. Meanwhile, the marginal case proportion can be estimated accurately by the proposed method.

logistic regression, regression, unlabeled data, (15 more...)

arXiv.org Machine Learning

2402.15365

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(4 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Health Care Technology > Medical Record (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

Add feedback

Semi-Supervised Statistical Inference for High-Dimensional Linear Regression with Blockwise Missing Data

Xue, Fei, Ma, Rong, Li, Hongzhe

arXiv.org Machine LearningJun-7-2021

Blockwise missing data occurs frequently when we integrate multisource or multimodality data where different sources or modalities contain complementary information. In this paper, we consider a high-dimensional linear regression model with blockwise missing covariates and a partially observed response variable. Under this semi-supervised framework, we propose a computationally efficient estimator for the regression coefficient vector based on carefully constructed unbiased estimating equations and a multiple blockwise imputation procedure, and obtain its rates of convergence. Furthermore, building upon an innovative semi-supervised projected estimating equation technique that intrinsically achieves bias-correction of the initial estimator, we propose nearly unbiased estimators for the individual regression coefficients that are asymptotically normally distributed under mild conditions. By carefully analyzing these debiased estimators, asymptotically valid confidence intervals and statistical tests about each regression coefficient are constructed. Numerical studies and application analysis of the Alzheimer's Disease Neuroimaging Initiative data show that the proposed method performs better and benefits more from unsupervised samples than existing methods.

covariate, probability, semi-supervised inference, (14 more...)

arXiv.org Machine Learning

2106.03344

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(2 more...)

Genre: Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (0.56)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

Double Robust Semi-Supervised Inference for the Mean: Selection Bias under MAR Labeling with Decaying Overlap

Zhang, Yuqian, Chakrabortty, Abhishek, Bradic, Jelena

arXiv.org Machine LearningApr-14-2021

Semi-supervised (SS) inference has received much attention in recent years. Apart from a moderate-sized labeled data, L, the SS setting is characterized by an additional, much larger sized, unlabeled data, U. The setting of |U| >> |L|, makes SS inference unique and different from the standard missing data problems, owing to natural violation of the so-called 'positivity' or 'overlap' assumption. However, most of the SS literature implicitly assumes L and U to be equally distributed, i.e., no selection bias in the labeling. Inferential challenges in missing at random (MAR) type labeling allowing for selection bias, are inevitably exacerbated by the decaying nature of the propensity score (PS). We address this gap for a prototype problem, the estimation of the response's mean. We propose a double robust SS (DRSS) mean estimator and give a complete characterization of its asymptotic properties. The proposed estimator is consistent as long as either the outcome or the PS model is correctly specified. When both models are correctly specified, we provide inference results with a non-standard consistency rate that depends on the smaller size |L|. The results are also extended to causal inference with imbalanced treatment groups. Further, we provide several novel choices of models and estimators of the decaying PS, including a novel offset logistic model and a stratified labeling model. We present their properties under both high and low dimensional settings. These may be of independent interest. Lastly, we present extensive simulations and also a real data application.

estimator, exp, semi-supervised inference, (15 more...)

arXiv.org Machine Learning

2104.06667

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
(2 more...)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback